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Introducing the 
Performance Assessment Sampler 



Performance assessment, constructed-response, and authentic assessment are terms 
sweeping through educational testing and reform. Much is in development, in prototype, and in 
early use. Every day an educator or testing official somewhere is likely being toid to get on top 
of what is being done in this area and get started in their state, district, or school. With the 
recent spurt in development, that isn't easy to do quickly. 

This "sampler'' is designed for the person who needs to get a handle on these new assess- 
ment efforts. It follows tlie pattern of the ETS PoUcy Information Center's previous "work- 
book" on national educational standards*, reproducing excerpts that give at least an acquain- 
tance with a project, and information on where to go to learn more. This is by no means an 
exhaustive inventory of efforts at alternative assessment going on in the United States, or at 
Educational Testing Service. Rather, it is a sampler that attempts to represent a broad range 
of efforts in this area. 

Paul E. Barton 
Director 

Richard J. Coley 

Senior Research Associate 



Ackn o wledgments 

We are indebted to all of the individuals and organizations who gave us permission to 
reproduce their materials. This permission, when required, is specified on the page introducing 
the particular project or material. 

Carla Cooper provided desktop pubUshing services and prep^.ration for printing. 



National Standards for Education: What They Might Look Like. A Workbook. Princeton, 
NJ: Policy Information Center, Educational Testing Service, 1992. 



Aquarium Problem and Teacher Guidelines 



New Standards Project 

The New Standards Project is a joint program of the National Center on Education and 
the Economy, which is based in Rochester, NTf , and the Learning Research and Development 
Center at the University of Pittsburgh. The project has attracted the participation of 17 states 
and six large school districts who already were far along in designing and administering a new 
generation of assessments based on performance rather than multiple-choice tests. 

The system created by the Project will set a high standard of performance for all stu- 
dents. The assessments will emphasize the ability to think well, to demonstrate a real under- 
standing of subjects studied and to apply what one knows to the kind of complex problems 
encountered in life. The Project will employ portfolios, exhibitions, projects, and timed per- 
formance examinations, all based on the use of real-life tasks that students are asked to do 
alone and in groups. 

In establishing content standards, the Project is drawing on the work of national bodies 
such as the National Council of Teachers of Mathematics and on curriculum frameworks and 
goals developed by the states. It will also work to establish international benchmark standards 
for student performance. Work has begun on the tasks that will constitute the core of the 
examinations and the first exams will be available in 1993-94. 

For more information on ihe New Standards Project, write to: 

Learning Research and Development Center 
University of Pittsburgh 
3939 O'Hara St., Room 408 
Pittsburgh, PA 15260 

or caU 412-624-8319 



The "Aquarium Problem'' is reproduced with the permission of the New Standards Project. 
The fish illustrations are reproduced with the permission of T.F.H. PubUcations, Inc., 
Neptune, New Jersey. 
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Learning Research an^i D;r\ elopmeni Center at the Unix ersity of Fi^t^bllrgi'! 

and the 

National Center on Kdiication and the Hconomv 



May 15, 1992 

Dear Student: 

Today you wiU be part of an exciting plan called the 
New Standards Project. We are looking at new ways of 
teaching, learning and testing. Our plan is to create 
interesting learning activities for students. We hope that 
these activities will give you a chance to show what you 
know and what you can do in math. 

All across the country, fourth graders from many 
communities are helping the New Standards Project by 
working on these learning activities. By showing and 
explaining your best thinking, you will help us improve the 
activities before trying them with other students. 

We thank you for your help and for being such an 
important part of the New Standards Project. 



Sincerely, 




Philip Daro 

Director for Mathematics 



BEST COPY AVAILABLE 

• 
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39 Sraie street ^uite ' R »che-»ler. \^ I <6I 1 'I'el: ""UvS tG-^OiO i'a.S: "U>oi^vnI \^ 



THE AQUARIUM 



Imagine that your school principal asks you to do a special 
job and gives you these written directions: 



Your class will be getting a 30 gallon 
aquarium. The class will have $25.00 to spend 
on fish. You will plan which fish to buy. Use 
the Choosing Fish for Your Aquarium brochure 
to help you choose the fish. The brochure tells 
you things you must know about the size of the 
fish, how much they cost and their special needs. 

Choose as many different kinds of fish as 
you can. Then write a letter to me explaining 
which fish you choose. In your letter, 

1 . tell me how many of each kind of fish 
to buy 

2. give the reasons you chose those fish 

3 . show that you are not overspending 
and that the fish will not be too 
crowded in the aquarium. 



8 



4S 

09 
•mm 

to 



e 

I 



c 
o 

C/3 

CO 



O U2 
O 



4) o 



S3 



2 e*^ 



CO 

O 



o o 

COX) 



•SI 

o 

«5 .s 
H c3 



z 





c2 g 

O )^ 

2 9 



n3 

CO 
G 

s 



t/3 ^ 

o 
o 

o " 

^ I 

x: 



is 

x: 

a w o 
(U o x> 

a o g 
« s ^ 



e5 
J3 



O 03 



CO 



CO 

O 



a- 



00 



g o 

a> o 

« s 

IS «= 



,5 «g 



t/3 

§ 



C3 



4) CJ 
O 

O "rt ^ 
(U 03 3 

i 

o x: 

O C/D ^ 

< o 



cs o <^ 

cog 



Student Reflections, Ideas 



You can help us malce these learning activities even better. 
Think about each of the following questions, and write to 
us what you honestly think. Be as clear as you can (you 
might want to give us examples of what you mean). 



What did you enjoy about the task? 



What did you not like about the task? 



How is this task like other activities you do in your class? 
How is it different? 
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Teacher Guidelines 
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THE AQUARIUM 
Guidelines for the Teacher 



Purpose 

In The Aquarium task, students use their knowledge of mathematics to solve a real world 
problem. Students use logical and numerical reasoning about money, measurement and 
realistic conditions to decide how best to stock an aquarium within the constraints of the 
situation. 

This task calls for logical and numerical reasoning and justification of that reasoning. 
These are contained in Standard 1 (Mathematics as Problem Solvirig), Standard 3 
(Ma&emaiics as Reasoning), and Standard 4 (Mathematical Connections) of the NCTM 
Curriculu m Standards for Grades K-4. Students apply understanding of measurement as 
described in Standard 10 (Measurement). 

This task is designed to assess students' mathematical thinking and their use of information 
contained in a brochure which features a chart, noi their ability to read the brochure and 
chart independendy. 

Materials and Resources 

• Students should have easy access to calculators 

• Paper, pencils 

• Extra copies of Choosing Fish for Your Aquarium brochure and Chart for 
Freshwater Fish (the last page of the task booklet) 

• Task booklet 

Time Required 

The Introductory Activities take about one 40-minute class period; anticipate two class 
periods for the assessment itself Allow enough time for the task so that you and the 
students feel that they have been able to do their best work. 

Ideas for post-assessment activities are included at the end of these guidelines. You may 
wish to use these to extend the exploration of the mathematical ideas contained in the 
assessment task. 

Introductory Activity 

Introductory activities should interest students in the task context (fish and aquariums) and 
ensure that all students have sufficient knowledge and curiosity about the context to enable 
them to work on the task. You may want to stan by helping the whole group generate and 
discuss some considerations in selecting fish for an aquarium. Guide students to include 
these criteria: price of fish, choosing types of fish that can live together, allowing sufficient 
room and oxygen for all the fish. 



After the discussion, help students read the Choosing Fish for Your Aquarium brochure. 
Help them understand the chart and help them to practice with it Have students notice and 
tell aU that they can about a particular fish Usted on the chart R -.ve students find fish that 
must live m groups of four or more ("schools"). Other possible questions: "My friend has 
a fish with red on it, what could it be?" "How much will it cost to buy four Puppies'?" 
"What is the shortest fish? The longest?" "Is there a yeUow fish that is longer than 2 
inches?" "Which fish is the most expensive." 

To build students' interest and readiness, a Think-Pair-Share activity can be helpful: 

Write the words "fish" and "aquarium" on the chalkboard. Ask students to 
think about these words, then to write down everything that they know 
about "fish" and "aquarium". Have smdents write down complete ideas, 
not just single words. 

• Allow smdents to form pairs and share their lists. 

• Now ask students to share as an entire group. Record their ideas and keep 
the group's list visible while smdents work on the assessment task. 



Assessment Task 

Read the Letter to the Students at the beginning of the task. This should 
help communicate that a good effort is expected of all students. 

Introduce the Aquarium task. You may want to refer back to your 
introductory activities. Read the task aloud with the smdents, or read it to 
them. Explain any words you think may be unclear. Make sure that 
students understand the considerations for choosing fish and that they know 
what to include in their letter to the principal. 

Each student should work independendy to create a workable solution, and 
write a letter explaining his or her choices. Your interaction with the 
smdents should be limited to making them comfortable with the assignment 
and to normal classroom management. 

When die students have finished the assessment task, ask them to respond to 
the Student Reflections, Ideas questions on the last page of the task 
booklet. 



Post- Assessment Snggf^tinnff 



The purpose of the post-assessment activity is to provide students with an opportunity to 
review how they solved the problem and to learn from their work on this task. 



Suggested Activities: 



• A "pair-share" procedure, similar to the one in the introductory activity, is 
one technique for doing such a reflective review. Students share solutions 
with their classmates, thinking about the crucial elements of the task, revise 
or at least revisit their work, and finally, reflect upon what they think they 
learned as a result of participating in this activity. 

• In this activity, students share solutions with their classmates, thinking 
about the crucial elements of the task^ revise or at least revisit their work, 
and finally, reflect upon what they think they learned as a result of 
participating in this activity. 

Begin by talking with students about how we often become better problem solvers by 
reviewing how we and others solved a particular problem. Mention that everyone is to be 
commended for the effort they put forth in working on this task. In order for us to improve 
our own problem solving skills, we are going to share our work. 

The activity could continue something like this: 

"First, we'll share with partners. Each of you will exchange your plan with a 
partner. As you read your partner's solution to the problem, note at least two 
things that you think showed good thinking. Also, write at least one or two 
questions which you might ask to better understand what your partner did to solve 
the problem or which might help your partner improve his solution to the problem 
Share your responses with your partner." 

To help students review each other's work, remind them of the critical task parameters- 
30 gallon tank and $25.00 limit-and the important information about the fish to be chosen- 
size, cost, and special qualities. 

Have students rotate partners within their groups. Students will need to keep track of the 
good points and questions/suggestions for each parmer. 

After students have had an opportunity to share with their partners, reconvene the entire 
class. Pose questions such as: 

What surprising or interesting things did you leain? 
What else would you like to share with the class? 

What would you do differently if you were to revise your solutio.1 for the task? 

You might then ask students go back and actually revise their solutions. Finally, you might 
ask students what they learned from working on the Aquarium task. Students should 
record their responses. 



The PACKETS™ Prog ram 



PACKETS^ is a major new program of Educational Testing Service to develop perfor- 
mance-based activities that teachers can use as part of classroom instruction and as the foun- 
dation for documenting the learning process. 

The program contains a series of high quality, nationally field-tested performance 
assessment activities or tasks. These materials are packaged by specific subjects and grade 
levels. PACEJETS^ materials include activities that: 

• Do not presume one correct way of thinking about the problem or just one 
right answer. 

• Require students to utiUze a broad spectrum of knowledge, reasoning, 
problem-solving, and commimication. 

• Require students to work in groups in a cooperative learning environ- 
ment. 

Although the PACKETS^" program will cover the K-12 spectriun across several content 
areas, the first set of materials is in middle school mathematics. The Middle School Math 
PACKETS™ program is currently in use in a limited number of field-test classrooms, and will 
be available nationally for the 1994-95 school year. 

The materials provided here include some overall information on the program, along 
vdth examples of activity, feedback, and assessment materials. 

For more information, contact: 

Nancy Katims 
Man Stop 37-B 
Educational Testing Service 
Rosedale Road 
Princeton, NJ 08541 



These materials are copyrighted by Educational Testing Service and are reproduced here with 
permission. 

The chart, "Race of the Sexes: What Lies Ahead,'' presented in the "PACKETS Times,'' is 
copyrighted by the New York Times. Reprinted by permission. 
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Women runners threiaten to overtake men 



If women's running 
times continue to improve, 
top women may soon catch 
up with the best men. 

In fact, women may 
even outrun men someday, 
according to two scientists. 

This prediction is based 
on the rate at which 
women's race times have 
been improving. Since 1920, 
women's times have 
improved much more than 
men's. 

Resarchers say that the 
best female runners should 
run marathons as quickly as 
men by 1998. Women 
should catch up with men 
in shorter track events by 
the middle of the next cen- 
tury. 

These predictions are 
based on a comparison of 
trends in men's and 
women's world records 
over the past 70 years. 
Based on these patterns, 
projections are made into 
the future. 

The results were pub- 
lished by Dr. Brian J. Whipp 
and Dr. Susan A. Ward in 
the British journal. Nature. 
The two scientists teach at 
the University of California 
at Los Angeles. 

Dr. Peter Snell, an exer- 
cise physiologist at the 
University of Texas, does 
not accept the results. "I'd 
agree that there's a way to 
go yet in women's perfor- 



mance, but if they're sug- 
gesting that women will 
approach men, that's ludi- 
crous." 

But the two researchers 
said the women's trend has 
been too consistent to 
ignore. 

Whipp said that before 
looking at the data, he did 
not think women would 
ever catch men. But now, he 
thinks, ''Men and women 
might be running equiva- 
lent speeds in the next cen- 
tury." 

He added, This is not 
me talking. If s the daU." 

In 1954, when Roger 
Bannister became the first 
man to run a four-minute 
mile, Diane Leather became 
the first woman to run a 
five-minute mile. If they 
had been in the same race, 
she would have finished 
320 meters behind 
Bannister. 

Today, the top female 
runner would finish only 
180 meters behind the 
fastest man< according to 
Whipp. 

In the marathon, 
woir n's times have 
improved about 61 percent 
since 1955. Men's perfor- 
mance has improved only 
18 percent. 

Women have come a 
long way, but there is still a 
long way to go to catch up. 
The fastest female runners 



today woxild not even qual- 
ify for the men's track 
events in the Olympics. 

In the marathon, the 
men's world record is 
2K)630, while the women's 
is 2:21.06. By marathon 
standards, this is a huge 
difference. 

Many people doubt that 
women will ever catch up. 

"Women will never, 
ever catch up to the men," 
said Frank Lebow, presi- 
dent of the New York Road 
Runners Club. "Maybe on 
paper this looks good, but 
I've been to 100 marathons 
around the world, and I've 
seen all the women runners. 
Women will never pass 
men. Never, never." 

Joan Benoit Samuelson, 
the 1984 Olympic marathon 
champion and record- 
holder among American 
women, said that women 
might get closer to men's 
tinrjes, but would never beat 
them. 

"Men have had a lot 
more time to evolve in the 
sport, and since they've got 
that jump start, they'll be 
hard to beat now," she said. 
"You also have the male 
ego to consider, and that's 
going to keep men going." 

According to Snell, who 
won three Olympic gold 
medals in the 1960s for run- 
ning, men also have physi- 
cal advantages. Men have 
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larger muscles, stronger 
bones, and a smaller per- 
centage of body fat. Men 
also have more red blood 
cells, so they can get more 
oxygen to their muscles. 

Snell thinks women's 
improvements are due to 
social changes. "Finally, 
women are starting to get 
out and do the same things 
as men." 

Patti Sue Plumer, who 



in 1990 was ranked No. 1 in 
the 3000-meter and 5000- 
meter events, does not 
think that physical advan- 
tages are that important. 
"As an athlete, I've learned 
that the mind plays a much 
stronger role than anything 
physiological," she said. 

Perhaps the debate will 
be settled only by time. For 
female runners, the race has 
only just begun. 



This special issue of the PACKETS Times was 
published as part of an ETS project on newspaper- 
based performance activities for mathematical 
instruction and assessment. 
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"Big Ideas" Activity 



PACKETS^M Project 
The Fast Track 

Do you think that wome;i nxay soon outrun men? How fast do you think 
women and men will txi^i \n 100 years? In 200 years? 

The editor of the school newspaper has decided to write an article titled 'The 
Fast Track." The article will include predictions and comparisons of the 
speeds for women and men in the 200-meter run of future Olympic Games. 
The editor has asked your class to predict what the speeds might be for the 
next 50 Olympic Games (the next 200 years). 

Write up your predictions and conclusions for the editor. The editor will 
need to explain and justify the predictions in the article. Therefore, include 
any graphs, charts, or other materials that woxild help the editor understand 
the reasoning for your predictions. 



Gold Medalists in the Women^s 200-Meter Event 



Year 


Name, Country 


Time in 
seconds 


Speed 
in mph 


1988 


Florence Griffith-Joyner, United States 


21.34 


20.9 


1984 


Valerie Brisco-Hooks, United States 


21.81 


20.5 


1980 


Barbel Wockel, E. Germany 


22.03 


20.3 


1976 


Barbel Eckert, E. Germany 


22.37 


20.0 


1972 


Renate Stecher, E. Germany 


22.40 


19.9 


1968 


Irena Szewinska, Poland 


22.5 


19.8 


1964 


Editti McGuire, United States 


23.0 


19.4 


1960 


Wilma Rudolph, United States 


24.0 


18.6 


1956 


Betty Cuttibert, Australia 


23.4 


19.1 


1952 


Marjorie Jackson, Australia 


23.7 


18.8 


1948 


Francina Blankers-Koen, Netherlands 


24.4 


18.3 
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PACKETS™ Project 
The Fast Track 



Gold Medalists in the Men's 200-Meter Event 





Name, Country 


Time in 
seconds 


Speed 
inmf^ 




Joe DeLoach, United States 


19.75 


22.6 




i^^i Lewis, Unitea States 


19.80 


22.5 


1QfiA 


r letro Mennea, Italy 


20.19 


22.1 


Ly/o 


Donald Quarrie/ Jamaica 


20^ 


22.1 


10T7 


vaien oorzov, UbaK 


20.00 


223 




1 onunie omitn/ United btates 


19.83 


22.5 




rienry caxT/ unitea btates 


20.3 


22.0 




Lrfivio Demiu/ xtaiy 


20.5 


21.8 




DODDy iviarroW/ uniteo ^^ates 


20.6 


21.7 




1 /\riurew Dtanneia/ uniteQ btates 


20.7 


21.6 


1948 


1 Mel Patton, United States 


21 1 


• 


1936 


1 Jessie Owens, United States 


20.7 


21.6 


1932 


Eddie Tolan, United States 


21.2 


21.1 


1928 


Percy Williams, Canada 


21.8 


20.5 


1924 


Jackson Scholz, United States 


21.6 


20.7 


1920 


Allan Woodring, United States 


22.0 


20.3 


1912 


Ralph Craig, United States 


21.7 


20.6 


1908 


1 Robert Kerr, Canada 


22.6 


19.8 


1904 


j Archie Hahn, United States 


21.6 


20.7 


1900 


1 Walter Tewksbury, United States 


1 22.2 1 


1 20.1 
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Examples of Pre-Activity Readiness Problems 
for packets™ Middle School Math 



1 . For each year during the past six years, estimate how fast you have run. Sl<etch 
a graph of your ainning speeds across those years. 

2. In a sentence or two, write a definition of a trend. Provide an example. Then 
draw a graph that shows this trend. 

3. A friend claims that 550 meters per minute is the same as 1 8.6 mph. Write a 
sentence or two that either justifies or disputes your friend's daim. 

4. Approximately how fast did women mn the marathon in 1 955? 

5. If you rode a tAke 1 00 yards in 20 seconds, what is you r speed in miles per hour? 
What is your speed in meters per minute? 

6. If you rode a bicycle faster than any man has ain a marathon, but slower than the 
fastest woman has ever run 1500 meters, how fast might you be riding? 



Examples of Post-Activity Exercises for 
PACKETS™ Middle School Math 



1. FoIIow-Ud Problems 

1 • a. Using the infomiation that is given in the graph in the newspaper, 
predict the running speed of the 1 992 worid record holder for each 
of the three events. 

b. Compare your predictions with the actual 1 992 worid records. 

c. In a sentence or two, describe any significant differences between 
your predictions and the actual records, 

2. If you ran the 200 meter event in 21 seconds, what would your average speed 



3. a. What is the average difference In running speeds of the gold 

medalists In the men's and women's 200 meter Olympic events 
for the Olympics between 1948 and 1988? 
b. Graph the average difference. 

4. Describe in your own way the data given for the gold medalists in the men's 200 
meter event without giving any of the actual numbers In the description. 

5. In doing this activity, what tools or resources would you have liked that might 
have been helpful? Describe in a few sentences how these tools might have 
changed your predictions. 

6. By how many percentage points did the running speeds of the gold medalists in 
the women's 200 meter Olympic event increase between 1948 and 1988? 
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II. Exploration Activity 

The exploration activity is an activity in which the students explore the pure mathematics 
of the activity in a concrete or graphic representation. For "Fast Track," the students 
might loolc for patterns in sequences of the following types: 



□ □ 
1 3 



D 



8 



16 



1 


(D 


3 


4 


5 


(D 




8 


9 


10 


11 


® 


13 


14 


15 


® 


17 


18 


19 


® 


21 


22 


23 


@ 


25 


26 


27 


28 


29 


@ 


31 


32 


33 


@ 


35 


36 


37 


® 


39 


40 


41 


42 


43 


44 


45 


46 


47 


48 


49 


50 


51 


52 







2, 6, 12, 16, 20 ,24, 30, 34, 38,... 



1.(2)3, 4,(5)6, 7, 8, 9,® 11, 12, 13, ... 



II mm 

2 5 10 



III. A pplication Activity ^ ^ 

The application activity encourages the students to extend the ideas developed in the 
"big ideas" activity. For example, the application activity for "Fast Track" might ask the 
students to make projections in a new content area such as world population growth. 
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Description of Mathematical Approaches 
for the 
"Fast Track" Activity 



Units of Analysis (Simple vs Composite) 

What are the units people think about when working on this problem? Sometimes 
people use small simple units such as one year, one Olympics, individual mnninq 
speeds or individual running times. Other tim(is they use larger, composite units; such 
as blocks of data, patterns or trends. 




Differences vs Ratios 

How do people think about change? Sometimes people think of charsge in ternis of 
differences (absolute change). Other times they think more in temis of percentages or 
ratios (relative change). Change can be relative to time intervals (e.g., years) or change 
can be relative to running speeds or running times. Complex ratios can be relative to 
both time intea'als and running speeds/times. 



I (^j^^ ' ----c^ ^ 
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Little-Picture vs Big-Picture 

How do people think about making predictions based on past perfomtances? 
Sometimes people thinl< about the prol)lem lay projecting from little-picture information 
what a next one will be, tfien a next one, a next one, and so on. Other times they 
extrapolate from Wg-picture information what some future situation will be and then 
attempt to fill in the holes. 



40 



;5x t(0 




32c»«)i, 5o In J 



Linear vs Non-Linear 



How do people think about trends? Sometimes people think about the problem in a 
linear fashion. They see a constant rate of increase and project it in a steady-state 
fashion. Other times people think about the problem in non-linear ways. They think in 
temis of a limiting factor or a dynamic rate of change. This may be expressed as 
leveling off, maxing out or peaking. 




a2^ 



I 

\ 



.^9 
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Numerical Patterns vs Graphical Patterns 



How do people think about data? Sometimes people think in ternis of numerical data, 
such as numt)ers, sequences of numtiers (e.g., lists.or tables) or composites of numbers 
(e.g., sums or averages). Other times they think in tenns of visual data, such as points, 
sequences of points (e.g., pattems or graphs) and composites of points (e.g., slopes). 




V2£l.3^3_ 

Independent Data vs Comparative Data 

How do people think about more than one set of data? Sometimes people respond to 
men's and women's data separately, project into the future, then compare the two. 
Other times they consider differences between the two sets of data, then project the 
differences. 
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Sampli» SvUdent Product 
Group One 





v<rctr mf.!! iruv\ U<f ^00 wio-kf ^ pCj 




1 ^ Pvcrt; VPrtJ- WiTTmpn i^iv^ -Hfl. ^^00 /ttp: 




iTlvir iMDrtJi^; RVBhff v*vir w//»t^eft ^uV , IQI '?e/UJivi& p.\/(?.tvj 







UbSl 



/ M 



11 



C7i 



"30 



3S 



ERLC 




mi? 



13- 



Si 



-to 



523 



100 



3^ 



=73: 



53n 



t?i.7r.i 



Ml 



21,., 



-r 



i3u 



TT 



mi 



/ 



1.0 



<^ A "7 rf- r / 



/ 




T 



J 1 
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Fast Track 
Group One 



Sample Product Cover Sheet 

(Conditions under which work was produced) 



Who contributed to this work? What is their background? 

(e.g., age, grade level, profession, etc.) 





How much time was spent on the activity? 




Draft 

{Clarify/Solve/Write Up) 



hrsj 
Additional Work 



What resources were used? 

(e.g., reference books, calculators, etc.) 



Additional Comments: 
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Instructional Feedback Option 
Fast Track 
Student Group One 



Dear Students, 

Thank you for taking the time to make predictions for my article, The Fast 
Track. However, the information you sent is not complete enough for the basis of 
an article. 

I like the argument that you gave on the first page, but wasnl completely 
sure that I understood where each of the numbers came from. (Where did 0.129 
come from? Have you confused years with Olympics? Remember, Olympics 
occur every four years!) In the tables, it looks as if you calculated differences 
and then averaged them. Unfortunately, I need more than just the time when 
women will surpass men. I need predictions for men and women for the next 50 
Olympics. Your predictions for the next eight Olympics are interesting, but not 
quite enough for me to use. 

To develop this work further, try to project perfomnance for 50 Olympics. 
The data need to be more readable, too. It would help if information was labelled 
and some explanation was given. 

You have made a good start on this project. However, I need clearer and 
more complete information if I am to use it for my article. 



Thanks, 



The Editor 
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Dimensions of Math PACKETS 
Interpretation Frameworic 



Mathematical 
Content in 
Response 



Utility of 
Response 



Description 



Evaluation 




was used? 



Am : 



was the ; - 
math used? 



pUrposei[s) 
dqes the 



;•: ; V f? •••• c^-v , 



iWdwilseful 



Is' 



time 




purpbse{s} 
stated ih 
the activity? 
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PACKETSTM Assessment 

Steps for the Teacher 



GATHER PROCESS INFORMATION 

(Mathematical and Social) 

• Observe students during the activity. 

• Listen to and question students during their presentations. 

• Lead/participate in classroom discussions. 



DESCRIBE THE PRODUCT (Ways of Thinking) 

• Analyze the mathematics used in the product. 



EVALUATE THE PRODUCT (Windows on Quality) 

• Assess the utility of the product from the point of view of the audience. 

• Describe the strengths and weaknesses of the product. 

• Suggest next steps to the student (i.e., the extent of revisions needed to 
make the product acceptable). 
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WINDOWS ON QUALITY OF THE PRODUCT 



The overarching consideration when assessing a piece of work should be how well it 
accomplishes the expressed purpose of the activity, the task at hand. One should look at the 
approach used and the results obtained, in relation to this purpose. To do this, one should 
keep in mind what students were asked to do. for whom, and for what purpose. 



How well does the product accomplish the purpose of the activity? 

(Does the product contain everything that urns asked for or a substitute 
that meets the expressed needs?) 

How well is the product supported by appropriate mathematics? 

How appropriately and effectively are the mathematical concepts used? 

How appropriately and effectively are the mathematical operations and procedures used': 

How appropriately and effectively are the mathematical representations used? 

How skillfully are the mathematics used? 

How understandable is the product to the intended audience? 

To what extent is the product - 

• clear • coherent 

• consistent • well-organized 

• complete • aesthetic? 

How reasonable is the product for the real-world situation? 

How logical are the solution paths from "the givens*' in the problem to the final product? 

How is the solution supported, justified, or explained? 

How much and what kind of information was used and/or generated? 

Was information omitted, ignored, distorted, or invented? If so, how? 

Is there something special in this product? 

Connections (within mathematics, across disciplines, or to the real-world) 

Awareness of assumptions, sources of error, or limitations 

Recognition of alternative approaches 

Extensions and generalizations 

Uniqueness 

Anything else 



3B 
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Performance Level Guides for 
Open-Response Common Items 
Grade 8 



Kentucky Department of Education 

A major part of education reform in Kentucky is the Kentucky Instructional Results 
Information System (KIRIS). The annual assessment, at grades 4, 8, and 12, has three parts: 
multiple-choice and short-essay questions; performance tasks that call for students to work 
together in groups or individually to solve simulated, real-life problems; and portfoUos that 
present each student's best work collected throughout the school year. 

As part of the 1991-92 KIRIS Assessment, each eighth grader took three open-ended 
questions in reading, mathematics, science and social studies. The science and social studies 
questions, together with examples of student responses at the distinguished, proficient, appren- 
tice, and novice performance levels, are reproduced here. 

For more information on Kentucky's assessment program, contact: 

Mr. Edward Reidy 

Kentucky Department of Edu<iation 

500 Mero Street 

Capitol Plaza Tower 

Frankfort, KY 40601 



These materials were reproduced with the permission of the Kentucky State Education 
Department. 



World-class Standards.. 




for 

World-class Kids, 

/;/ Kcntticky, wc just expect more! 

f 



Performancetevel Guides 

for 

Open-Response Common Items 

GRADE 8 



ERIC 



32 



Kentucky i:)epartmont of I'ducdtion 
rhonicis C. BovstMi, C ommissionor 



gEST COPY MAILABLE 



Introduction 



As part of the 1991-92 KIRIS Assessment, each eighth grader took 
three open-ended questions in reading, mathematics, science and sodal 
studies. Those questions, together with examples of student responses 
at the distinguished, proficient, apprentice, and novice performance 
levels, are provided in this booklet In addition to the open-ended 
questions in each of the four content areas, students were required 
to compile a writing assessment portfolio to demonstrate their 
proficiency in writing. The table of contents for the portfolio, along 
with examples of student writing at each performance level, are also 
included- 



Grade 8 

Science Open-Response Common Items 

Open-response 2: 

How would life and the condrtioRS on earth iDe different if ail bacteria and fungi became extinct? 
Explain the changes that ni^ht occur and give as much detail as possfele. 

Open<>response 3: 

TTra table below shows the information a researcher has gathered about the students in the 
seventh grade in a school. Use this infomtation to answer the questions that follow the tat}le. 











Nutnbar of 










Bast 


Bfotbats li 


Qrada ki 





Sax 




Sublact 


Sbrtars 


fitadbtt 


Adams 


M 


12 


M 


2 


80 




M 


12 


R 


0 


85 


Blown 


M 


13 


M 


2 


76 


Burton 


F 


12 


R 


2 


84 


CajTwa 


F 


12 


M 


0 


87 


Dcivonport 


M 


13 


S 


1 


85 


Fenwick 


M 


12 


M 


0 


79 


Fr&nMin 


M 


12 


R 


2 


77 


Qarvsy 


M 


12 


M 


1 


81 


Harris 


F 


12 


S 


3 


70 


Kattsy 


M 


12 


S 


0 


83 


LaFontaina 


M 


12 


M 


1 


80 




F 


12 


H 


1 


76 


Moof« 


F 


13 


S 


1 


82 


Patarson 


M 


12 


M 


1 


86 


PoN¥> 


F 


12 


H 


3 


80 


Sabtn 


M 


13 


S 


0 


90 


Smiti 


F 


12 


M 


2 


83 


Washburn 


F 


12 


R 


4 


72 


Wainbarg 


M 


13 


H 


3 


75 


Wiison 


F 


13 


M 


2 


79 


Sax: M-Maia F^emaia 

Bast Subjact: H-History, M-Math. R^Raading. S-Sctanca 



There are five groups of students based on the number of brothers and sisters each student 
has. Compute an average reading score for each group. Show your work. 

• Make a graph of your results. 

• What concSusbn can you draw from your results? 

Open-response 4: 

Katie believes that students who do between 4 and 10 hours of homework per week make better 
grades than students who do not do homework or who do more than 10 hours of homework 
per week. To test this hypothesis, she is writing a survey that she wii! give to students at her 
school. 

• What questions should Katie include in her survey? 

• Descrit>e the scientific procedure Katie shouM use. 

• Describe what Katie shoukl do with the responses to her survey to find if her hypothesis 
is correct. 
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— DISTINGUISHED LEVEL — 
Student Response Samples* 



The student at ihe distinguished level presents an articulate, thorough, and complete response 
to each question, usmg concise and precise analyses. The response dearly shows an understanding 
of saentific thought processes and procedures, the applicarion of the same, and an extension beyond 
the expected achievement for a student at this grade level. 



OPEN-RESPONSE 2 



Though extremely tnmute, bactena end fungi have an enormous role in life on Earth. Without them manv 
chang^ would occur. Bactem break down dead organic matter so it wUl decay. If bacteria and funxi becarrle 
achnct, the deadmtmals would neoer decompose. Instead, they would keep accumulating until tlvre would 
oe no room for humans to Iwe. This v.mld also cause serious environmental problems. The buUd-up of 
the dead o^amsms would aeate poor living conditions, not to mention the sickening smell. Decayed organic 
master fertdues the sod so that plants can grow. If the bacteria did not cause decay, the plants could 
not grow. If plants did not grow, humans ould be left without a food source. Food chains of every kind 
would be uj^et without the hdp of bacteria and fungi. Also, oxygen could not be produced if there were 
rw plants. Some bactena change the nitrogen in the air to a form that plants can use. Again, the lack 
^ bocfcrw would be a hmderance to plant growth. Fungi and bacteria also cause and spread diseases 
mth the extmction of bactena and fungi, many common diseases today may become extinct also. Though, 
this would seem like a good change, it could also be a problem. If no animal or human contracted a disuse 
. l^T" In turn, a food shortage would occur. Life vxmld be even more 

dfu ^ Therefore, this world would certainly be harmed by the extinction of bacteria 



in-depth, multifaceted, creative response 
analyzes both positive and negative aspects in detail 
focuses on the global aspects of this scenario 
includes the effects on the physical environment 



OPEN-RESPONSE 3 



My concusion is, the less siblings you have the better in reading you are. You will have higher grades 
if you don t have any or very few brothers and sisters. 5^ 84^5/^ o^*p» 

-26 ±0 



0 - 85 87 79 83 90 72? 1 ^-^^^ 

1 - 85 81 80 76 82 86 eslL jgS 

2 - 80 76 84 77 83 79 ^§S 



3 - 70 80 75 Ifo ^ 

4 - 72 ,1^ \p ii^ 





• all necessary components are present and correct 
OPEN-RESPONSE 4 



IfjitK should ask [1] On the average, how many hours per week do you spend doing homework? [2] What 
are your grade point averages in Language Arts, Math, History, and Science? [3} What do you think 
influences ymr ability to do well? [4] Do you have any problems at home or school that could be influencing 

^^^J!''^!- ^^^'^ ^""'^ " ^^^^^ ""'«*'^ 0/ students in all grades and classes and 

ask for their assistame in her experiment. She should take the information they give her and compile this 
data. The she should make a conclusion by discovering, by doing an average according io grades and how 
many hours of study, which group of students does the best and why. She could share her responses with 
a teacher to test her hypothesis and get a second opinion. 



all components are well-defined and articulated 

:aises creative, valid questions beyond the expected 

attempts to verify results by sharing information with a teacher 



^ • Wherever typed student responses appear, student errors have not been corrected. ^5 
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— PROnCIENT LEVEL — 
Student Response Samples 

The student who has achieved a profident level of performance has completed most of the 
important aspects of each open-response question^ and is able to articulate the information required. 
While the overall quality of the responses displays proficiency^ the student lacks the clarity and 
creativity present in the distinguish^ student 

OPEN-RESPONSE 2 

// ortj/thing on earth dies out it xvill mess up the food chain and the land. If we no longer had fungi, 
and bacteria the dead organisms wouldn't have amfthing to eat. they'd have to find something new or 
die. chain reactions would goon for the entire food chain. People use a lot of fungi & bacteria in medicines 
so roe would have to find something new too. 

• recognizes the global significance of bacteria and fungi 

• demonstrates adequate analysis of the question 

• lacks detail and darity 



OPEN-RESPONSE 3 




• shows correct averaging of the reading scores 

• graph is well done and complete 

• student has failed to generate a conclusion 



OPEN-RESPONSE 4 

Katie should ask these questions: "Haw many hours of homework do you do per week?" and "What grade 
do you receive?''. First, Katie should test her hypothesis by conducting the survey. She should gather 
information from several people in order to get an accurate conclusion. Then, Katie must analize her results 
and draw a conclusion. Katie could make a graph of the survey to see if her hypothesis was correct. The 
graph would enable her to easily see the results. 

• two of the three crucial components answered adequately 

• third component is sketchy and vague 
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— APPRENTICE LEVEL — 
Student Response Samples 

At this level the student has completed some 
important portions of the open-response questions. There 
are significant gaps in the students conceptual under- 
standing. 

This student has begun to grasp some important 
aspects of scientific processes and thinking patterns. The 
apprentice student lacks large and critical pieces of the 
good response but has begun to display some fundamental 
knowledge of portions of the questions. 

OPEN-RESPONSE 2 



Well for me thing we would not have some medicines. 
For exp. penicillin is a bacteria medicine. Made mostly 
of molds. Even food is made from fungi and bacteria. So, 
we do need these things in life. 



recognizes some important concepts 

lists two disadvantages to the loss of bacteria/ 

fungi 

fails to list any advantages 

lacks any in-depth discussion 

fails to recognize global significance 



OPEN-RESPONSE 3 



The people with 0 siblings have better grades. Maybe cause 
they don't have to worry or fight with them. The people 
with a lot of siblings have bad grades probably cause they 
can't concentrate on studies. 

0 - 84.8% 

1 - 51.66% 

2 - 79.83% 

3 - 75% 

4 - 72% 



misses crucial component by not making any 
attempt to graph the reading averages 



OPEN-RESPONSE 4 



Questions: Haw many hours do you study a night. 

Do you study before a test. 

What are your grades. 
Procedure: Gather as much data as possible and make 

a chart. 

Response: Give it to her teacher. 



one of the three crudal components of this question 
has been adequately addressed 
understands Q\e appropriate questions to ask to 
gather information 

does not understand how to conduct or analyze 
a scientific study 



— NOVICE LEVEL — 
Student Response Samples 

A student at the novice level of perfonnance does 
not grasp the question and has not responded to it in 
a meaninghil way. 

Responses indicate little or no understanding of the 
questions. Many responses are merely restatements of 
the question, and often do not make sense. Some 
students at tl novice level will simply write nonsense 
when fadng a question that they don't understand. 

OPEN.RESPONSE 2 



There xvould be know bacteria and dirt around, & I think 
that the vxyrld vxndd be a whole lot cleanier and different. 



• fails to process question in a meaningful way 

• reference to "dirt" is not appropriate for question 

OPEN-RESPONSE 3 



80, 76, 84, 85, 77, 81, 80, 76, 82, 86, = 807 
807 is the average 



• does not understand how to compute an average 

• has not attempted to construct a graph 

• has not drawn any conclusion 

OPEN-RESPONSE 4 



What do they study. After Katie got all the answers she 
should try to do them to see if it really u)ork$. 



does not know appropriate questions to ask 
does not understand how to conduct a meaningful 
scientific study 

unable to respond in a meaningful way 
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Grade 8 

Social Studies Open-Response 
Common Items 



Open-response 2: 

In the United Slates, people are sometimes treated unfairty because of the group to which they 
belong. 

• Identify a group that has been treated unfairfy because of sex. natbnal origin, religion 
or race. 

• Describe several ways in which this group has been treated unfairty throughout United 
States history. 

• Also describe several ways in which people have tried to correct these problems. 



Open-response 3: 

New Yort^ City, as of 1990. had a population of 7.322.564, Shclbyville. Kentucky, at the same 
time, had a population of 6.238. 

• What are SEVERAL opportunities people from, a large uftan center would claim they 
have that p^eopie from airal areas do not have? 

• How would people iion\ the airal areas probably a^ue against those claims? 

• What are seme problems thai people from the two types ot ccmnriunrties would have 
in common? 

Open-response 4: 

The shaded areas on the maps below indicate the extent of rain forests at two different times 
in the last 50 years. 

• Describe several changes that have taken place in the rain forests that explain the 
differences between the maps. 

• Why have these changes been occurring? 



Extant of Tropk:al 
Rain Forest - 1940 



Extent of Tropical 
Rain Forast - 1988 
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— DISTINGUISHED LEVEL — 
Student Response Samples'^ 



Students at this level demonstrate the ability to make interpretations, draw conclusions, summanze, 
analyze, and evaluate infoimation, and provide logical arguments in support of a position based on 
their understanding of sodal studies knowledge and concepts. They can clearly communicate ideas 
and often "go beyond" what is asked for in a question by incorporating as many relevant ideas, 
infonnation, and examples as possible. 

OPEN-RESPONSE 2 



Vie should take a lesson from the Japanese. They have men from the ashes of WWII, and have 
become a world power. But instead, Americans choose to insult them. 

The Japanese have been discriminated against because of WVm and Pearl Harbor. They hope been 
riduculed because they are buying American companies. Most recently, toe have had a name-calling contest 
with them since their prime minister said that American workers were ''lazy and illiterate." 

Unfortunatdy, we have not done much to combat discrimination of the Japanese. But if we don't 
like what they are doing by buying America, we must do something about U, and not by badmouthing 
them. 



• discusses gioup that has been discriminated against as well as the ways discrimination has 
occurred 

• gives one possible solution for solving problem of discrimination 



OPEN-RESPONSE 3 



People who live in urban areas have distinct advantages over rural areas in some cases. They have 
ready access to jobs, banks, theatre, and almost anything else. There are more jobs available, and they 
have better education. 

However, in small towns, you know just about everyone, there is less crime, and your voice can 
be heard more in politics. 

Unfortunately, we share some of the same problems: drug abuse, child abuse, kids who don't want 
to learn, and people who don't like each other. These are problems you mil face anywhere you go, 

• discusses advantages and disadvantages of bcth urban and rural areas 

• gives several problems that are common to both urban and rural areas 



OPEN-RESPONSE 4 

Every day, we lose hundreds of acres of tropical rain forest. Goer 50 years, we have wiped out 
a lot of animals and their habitat. The land on the maps has been cleared. There are several reasons why. 
First is for farming. But, the soil in the rain forest is poor, and it can only support crops for a few 
years. Farmers must clear more land. Second is for building homes. Third, many forests are being cut 
down for their expensive xooods, like ebony, teak, and mahogany. We must stop clearing rain forests. We 
are killing animals and robbing ourselves of a xoonderful treasure, 

• gives accurate description of maps 

• discusses reasons for the disappearance of the rain forests 

• discusses problems that deforestation creates 
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• Wherever typed student responses appear, student errors have not been corrected. 
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— PROnCIENT LEVEL — 
Student Response Samples 

Proficient students can demonstrate an understanding of social studies concepts and /or issues 
and generate a coherent discussion based on d\em. Students may be able to drav/ plausible conclusions 
in light of d)eir social studies knowledge, but may not be able to provide adequate support for their 
positions. 

OPEN^RESPONSE 2 

The Native Americans have always been treated wrongly. Back during those first years of our country, 
settlers took their land from them, destroyed their camps, killed the Indimis' source of food and waged 
an unrightful war a^inst them. The Natioe Americans, once roaming the whole tuition, were forced to 
live on small reservations. At the beginning, there were men that warned Spain of the wrongs committed 
against Indian. Since then, there have hem people protesting against it, even during the settlers' days. 
There were never enough, though and that should be our greatest shame. 

• indicates an understanding of a group that has been treated harshly and provides the reasons 

• gives historical basis for the answer, but is minimal in its breadth and scope 



OPEN>RESPONSE 3 

There would he better schooling, more ajltural centers, more opportunities for people to become involved 
xoith activities. This would all happen because there are more people, more needs, and more tax money. 
Rural people may fed that since there are less students, teachers may spend more time xvith the children. 
They may feel that their cultural centers may be fewer in number, but better in content. They may fed 
that the opportunities for the activities would be better handled because there are less people to handle. 
Both communities would have to zvorry about crime, pollution, and homeless people. 

• indicates a knowledge of advantages and disadvantages associated with urban tnd rural livings 
with some inaccuracies 



OPEN-RESPONSE 4 

The rain forests have definitdy been shrinking. Every year thousands of acres are being destroyed. Trees 
are cut down and land cleared and with it all, habitats for as yet unknown species. Plants that could 
produce life-saving medicines are being killed. The rain forests are cleared because of many reasons. The 
inhabitants of S.A. need land to grow crops, lumber to sell, and animals to export for pets. It all comes 
down to the basic role of money. 

• shows an understanding of, reasons for, and effects of deforestation 
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— APPRENTICE LEVEL — 
Student Response Samples 

A student at this level displays some understanding 
of social studies concepts. The student may clearly 
communicate some accurate knowledge, but a thorough 
understanding is not apparent An apprentice level 
student is beginning to incorporate facts with processing 
skills to develop a clear response. 

OPEN-RESPONSE 2 

I think mainly in our history blacks have been treated 
unfairly. During colonial times many blacks were being 
used as slaves. During the 40's, 5&s, and 60 s many 
blacks were put under public humiliation under segregation 
and the lack of basic rights that the whites had. During 
the years black leaders like Martin Luther King Jr and 
Malcolm X have stood up for black right and are probably 
the main reasons why blacks have the same rights today 
as everyone else. 

• indicates some insight as to ways in which a group 
of people have been treated unfairly 

• provides minimal description of how people have 
tried to correct the problem 

• implies that problems have been resolved 



OPEN-RESPONSE 3 

I 

People from a large urban center would claim that they 
have easier access to things like shopping centers and food 
markets. They would say that they have a special closeness 
with their neighbors and that they could trust them. 
People from a rural area would propably say that in a 
city they have more crimes and it's dangerous to walk 
the streets at night, but in the country is safe and quiet. 
They might have fire and theft problems alike. 

• generally lacks explanation and insight 

• gives some ways in which urban and rural areas 
are both alike and different 



— NOVICE LEVEL — 
Student Response Samples 

A student at ch?s level is unable to dearly communicate 
important ideas that would indicate an understanding 
of social studies concepts. Discussion of social studies 
issues at this level may be purely recall of fact with no 
strategy for processing the information coherently. 

OPEN-RESPONSE 2 

Sex is not all unfairly and pluse is you think sex is 
unfairly. It is to alot of young boys and girls. They 
shouldn't be talking about sex anyways. 

• fails to identify a group that has been treated 
unfairly 

• shows minimal understanding of the question 



OPEN^RESPONSE 3 

They both have people and schools and jobs maybe the 
school ivork is I don't know, i go to Dayton. New York 
City has more people. 

• \dicks any attempt to compare and contrast char- 
acteristics of urban and rural areas 



OPEN'RESPONSE 4 

In 1940 there is a lot of rain forests that In 1988 has 
less rain forest 

• makes a statement in reference to the change that 
has taken place in South American rain forest 
regions 

• no attempt to cite causes or effects of rain forest 
loss 



OPEN-RESPONSE 4 



There are many changes that have occured in the rain 
forest. Everyday men are bulldozing and bum down 
rainforests just for Ian to raise cattle on. Many companies 
are doing the same thing but using the land for apartment 
houseSf shopping centers, and other odd sources of 
buildings. These changes mainly have been occuring 
because the population is steadily increasing in South 
America. With the rise in population they need more land 
to live on. 



• indicates some understanding of why the rain 
forests are shrinking 

• gives possible reasons but little explanation 



Advanced Placement Program 



1992 Mathematics: 
Free-Response Scoring Guide and Sample Student Answers 

Calculus AB Calculus BC 

"Message to Teachers,'' sample student responses, scoring guides for question 1 of the 
1992 AP Calculus AB Examination, and "Reminders for Secondary School Teachers'' are 
reproduced here. 

The Advanced Placement Program is a cooperative educational endeavor sponsored by 
the College Board and administered by Educational Testing Service. The program serves three 
groups: students who wish to pursue college-level studies while still in high school, high schools 
that wish to offer these opportunities, and colleges that wish to encourage and recognize such 
achievement. The Program provides materials describing college-level courses to participating 
high schools and the results of examinations based on these courses to the colleges of the 
students' choice. Participating colleges, in turn, grant credit and/or appropriate placement to 
students who demonstrate qualifying performance on the examinations. 

Except for Studio Art — a portfoUo assessment — the AP Examinations are approxi- 
mately 50 percent free-response (essays, problems, programs, taped perfoimances, etc.). 
Course descriptions, teachers' guides, released examinations, free-response guides, student 
guides, and other curricular materials are available in art, biology, chemistry, computer 
science, economics, English, French, German, government and politics, history, Latin, matli- 
ematics, music, physics, psychology and Spanish. Throughout the country, in all of these 
disciplines, teacher development workshops are available year-round and week-long institutes 
are available in the summer. 

For more information, contact: 

Advanced Placement Program 
Mail Stop 85-D 
Educational Testing Service 
Rosedale Road 
Princeton, NJ 08541 



These materials were reproduced with the permission of the College Board and Educational 
Testing Service. 
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1992 APMathematic 




Dear Teacher: 



This Guide is an attempt to remove the mystery of how AP Mathematics free-response questions are 
scored. 

As you probably know, AP Examinations are created afresh each year by faculty development commit- 
tees and Educational Testing Service (ETS) test development specialists. All the committee members 
teach calculus in colleges, universities, and secondary schools. The multiple-choice questions are 
pretested in college calculus classes; they are then analyzed and scored so that the committee is able to 
judge their suitability for inclusion in the final examinations that AP students take each May. 

The scoring of the free-response questions takes place during June at the AP Reading. Before the 
faculty consultants — college and AP teachers from around the country — gather at the Reading, the 
chief faculty consultant. Professor Raymond J, Cannon of Baylor University, together witli the two 
examination leaders, prepared the first draft of the tree-response scoring guides at Clemson University 
in South Carolina before the arrival of the 40 table leaders. The scoring guides were refined after the 
table leaders scored the full range of sample responses. Using the agreed upon scoring guides and 
sample papers, the table leaders trained the 322 faculty consultants who arrived at Clemson on 
June 13. After the training, using the samples, faculty consultants were ready to score actual student 
responses. Throughout the Reading, the table leaders continuously monitored the faculty consultants, 
checking their scores and discussing particular student responses with them. Over the course of six 
days, 77,508 Calculus AB papers and 15,639 Calculus BC papers were read. 

To arrive at a total grade, the scores for each question given by faculty consultants were weighted and 
combined with the multiple-choice scores. Armed with statistical data about student performance, the 
apparent ability of the student group, and past score distributions, the chief faculty consultant set the 
grade "cut points" after consulting with ETS and College Board staff. 

This Guide illustrates the range and quality of student work. For each free-response question from the 
1992 AP Mathematics Examinations, the chief faculty consultant has provided a general comment and 
an observation about the performance of the candidates. Next you will find the scoring guides. Prob- 
ably most useful in your work will be the actual student responses and the chief faculty consultant's 
comments on those responses explaining why they merited the scores they received. National grade 
distributions follow, along with information about how to interpret them. 

We hope this publication will be a valuable aid to you in your teaching. 




Wade Curry, Director 
Advanced Placement Program 



ERLC 



50 



45 



L 

THE 1992 AP CALCULUS AB EXAMINATION 



Free-Response Questions, Scoring Guides, and Sample Student Answers 
Question 1 

This opening problem requires some basic analysis of a polynomial graph. Students should have 
answered Part (a) by considering the sign of the first derivative, noting that it remains negative on both 
sides of the critical point at 0, Part (b) was to be answered by considering the sign of the second 
derivative. In Part (c)* students had to indicate that horizontal tangent lines occur at all three critical 
points, including the "shelf point at x = 0. 

A favorite technique for analyzing the sign of f'{x) and f"{x) is through a ''sign chart." Students 
were required to have labeled such charts clearly, and to indicate what information was being used to 
draw their conclusions. 

1. Let / be the function defined by f(x) = — 5x- -r 2, 

(a) On what intervals is f increasing? 

(b) On what intervals is the graph of / concave upward? 

(c) Write the equation of each horizontal tangent line to the graph of /. 



Solution 



Scoring Scale 



Points (See Footnote) 



(a) fix) = 15.V^ 
Sign of /' 



I5a- = \5x-ix- - \) 



fix) 



Answer: J' is increasing on the 
intervals ( - x» - 1] and [1. 



Analyzes sign of /'(.v) or 
explicitly sets student's 
/'(.V) > 0 



Answer 



(b) /"(A-) = 60x- - 30.V = ?0(2.v- - I) 
Sign of /" - . ^ - . ^ 



./"(A-) 

Analyzes sign of / "(a' ) or 
explicitly sets student's 

rix) > 0 



1 



Answer 




and on 





(c) fix) = 0 when = - L 0, I 
X = -\^f[x) = 4: V = 4 
/(O) = 2: V = 2 
/(I) = 0;y = 0 



3^ 



1: Solves student's f\x) == 0 

< - 1 > fewer than 3 solutions 
1 : One answer 

1: All other consistent answers 

(must be at least one) 
Note: For *'arswer only;' maximum 2 of 3 
if no /' 



Student Response 

1. Let / be the function defined by /(x) = 3x- - 5x^ -f 2. 
(a) On what intervals is / increasing? 

Cv-iin" Ca..{ v^<»^Ik£S 



L'i. 0 ) 
Co. •) 



c 
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(b) On what intervals is the graph of / concave upward? 



X = O X 



r C 



-30 



/5 ^^^^^^^ ('ir}0) -^C:^)-^) 



(c) Write the equation of each horizontal tangent line to the graph of /. 



Comment: In Parts (a) and (b), note how clearly this student communicates by first labeling the sign 
charts of /' and /" and then showing the corresponding conclusions about the behavior of/. Because 
of this clarity, faculty consultants were not confused when the student wrote the two "prime" signs for 
/" so close together that they may appear as one. The student is not penalized for the grammatical 
error in Part (c), using ''at" rather than "are." Similarly, the student's answer to Pan (a) is technically 
incorrect, since /is not increasing on the set that is the union of these two intervals. However, because 
the question asked for the intervals, the union sign was interpreted as ''and on'' and credit was 
awarded. The score for this solution is 9. 
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Student Response 

1. Lei / be the function defined by f{x) = 3.x- - 5;c^ -r 2. 



(a) On what intervals is / mcreasing? ^ 



[7^ mwoGina G-'jOG and Pj^J 

^- CXl 



(b) On what intervals is the graph of / concave upward? 




54 



49 



(c) Writs the equation of each horizontal tangent line to the graph of /. 



Comment: In Part (a), this student showed algebraically that f{x)> 0, but failed to solve that inequal- 
ity correctly. In Part (b), he or she managed, through some algebra that by itself appears suspicious, to 
solve the desired inequality correctly. Note how the student used sign charts without indicating what 
they mean, making it impossible to assign partial credit if the answer had been incorrect. In Part (c), 
the student missed the first of the three points available because the answers were crossed off. 
Historically, a penalty has been assigned for crossing off correct work, and here that penalty amouisted 
to one point. In the future, crossed off work will not be scored. The score for this solution is 5. 



55 



Reminders for Secondary School Teachers 

AP Examinations are designed to provide accurate assessments of achievement when the results are 
used properly. Any examinati.on has limitations, however, especially when used for purposes other 
than those intended. Presented below are general and specific suggestions for teachers to aid in the use 
and interpretation of AP grades, 

• AP Examinations are developed and evaluated independently of each other. They are 
linked only by common purpose, format, and method of reporting resuhs. Therefore, 
comparisons should not be made between grades on different AP Examinations. An AP 
grade in one subject may not have the same meaning and interpretation as the same AP 
grade in another subject, just as national and college standards vary from one discipline to 
another. 

• AP grades are not exactly comparable to college course grades. The AP Program conducts 
research studies every few years in each AP subject to ensure that the AP grading standards 
are comparable to those used in colleges with similar courses. In general, these studies 
indicate that an AP grade of 3 is approximately eqixal to a college course grade of B at 
many institutions. At some other institutions, an AP grade of 3 is more nearly comparable 
to a college course grade of C. These are only generalizations, however. The degree of 
comparability of an AP grade to a college course grade depends to a large extent on the 
particular college course used for comparison. 

• The confidentiality of candidate grade rt^ports should be recognized and maintained. All 
individuals who have access to AP grades should be aware of the confidential nature of the 
grades and agree to maintain their security. In addition, school districts and states should 
not release data about high school performance without the school's permission. 

• AP Examinations are not designed as instruments for teacher or school evaluation. A large 
number of factors influence AP Exam performance in a particular course or school in any 
given year. As a result, differences in AP Exam performance should be carefully studied 
before attributing ihem to the teacher or school. 

• Where evaluation of AP students, teachers, or courses is desired, local evaluation models 
should be developed. An important aspect of any evaluation model is the use of an 
appropriate method of comparison or frame of reference to account for yearly changes in 
student composition and ability, as well as local differences in resources, educational 
methods, and socioeconomic factors. 

• The "Report to AP Teachers'* can be a useful diagnostic tool in reviewing course results. 
The report identifies areas of strength and weakness for each AP course. This information 
may also help to guide your students in identifying their own strengths and weaknesses in 
preparation for future study. 

• Many factors can influence course results. AP Exam performance may be due to the degree 
of agreement between your course and the course defined in the relevant AP Course 
Description, use of different instructional methods, differences in emphasis or preparation 
on panicular parts of the examination (e.g., writing and organizational skills), differences 
in pre-AP curriculum, or differences in student background and preparation in comparison 
with the national group. 



"Performance Assessment'' 



Education Research Consumer Guide^ 
Number 2, November 1992 



This guide to performance assessment, produced by the U.S. Department of Education, 
defines the topic and provides some basic information including what the research says, the 
cost of performance assessment, and provides some examples of successful strategies and 
programs. People to contact for more information are also Usted. 



Education Research Consumer Guide is a new series designed for teachers, parents, and 
others interested i a current education themes. JacqueUne Zimmermann is the editor. It is 
pubhshed by the Office of Research, Office of Educational Research and Improvement, U.S. 
Department of Education. 
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What is it? Performance assessment, also known 
as alternative or authentic assessment, is a fomi of 
testing that requires students to perform a task rather 
than select an answer from a ready-made list For 
example, a student may be asked to explain histori- 
cal events, generate scientific hypotheses, solve 
math problems, converse in a foreign language, or 
conduct research on an assigned topic. Experienced 
raters — cither teachers or other trained staff— then 
judge the quality of the student's work based on an 
agreed-upon set of criteria. This new form of as- 
sessment is most widely used to directly assess 
writing ability based on text produced by students 
under test instructions. 

How does it work? Following are some meth- 
ods that have been used successfully to assess 
performance: 

I Open-ended or extended response exercises 
are questions or other prompts that require stu- 
dents to explore a topic orally or in writing. 
Students might be asked to describe their obser- 
vations from a science experiment, or present 
arguments an historic character would make con- 
cerning a panicular proposition. For example, 
what would Abraham Lincoln argue about the 
causes of the Civil War? 

I Extended tasks are assignments that require sus- 
tained attention in a single work area and are 
carried out over several hours or longer. Such 
tasks could include drafting, reviewing, and re- 
vising a poem; conducting and explaining the 
results of a science experiment on photosynthe- 
sis; or even painting a car in auto shop. 

I Portfolios are selected collections of a variety of 
performance-based work. A portfolio might in- 
clude a student's "best pieces" and the student's 
evaluation of the strengths and weaknesses of 
several pieces. The portfolio may also contain 



some "works in progress" that illustrate the im- 
provements the student has made over time. 

These methods, like all types of performance assess- 
ments, require that students actively develop their 
approaches to the task under defined conditions, 
knowing that their work will be evaluated according 
to agreed-upon standards. This requirement distin- 
guishes performance assessment from other forais 
of testing. 

Why try it? Because they require students to 
actively demonstrate what ^hey know, performance 
assessments may be a more valid indicator of stu- 
dents' knowledge and abilities. There is a big dif- 
ference between answering multiple choice 
questions on how to make an oral presentation and 
actually making an oral presentation. 

More important, performance assessment can pro- 
vide impetus for improving instruction, and in- 
crease students' understanding of what they need to 
know and be able to do. In preparing their students 
to work on a performance task, teachers describe 
what the task entails and the standards that will be 
used to evaluate performance. This requires a care- 
ful description of the elements of good performance, 
and allows students to judge their own work as they 
proceed. 

What does the research say? Active learn- 
ing. Research suggests that learning how and where 
information can be applied should be a central part 
of all curricular areas. Also, students exhibit greater 
interest and levels of learning when they are re- 
quired to organize facts around major concepts and 
actively construct their own understanding of the 
concepts in a rich variety of contexts. Performance 
assessment requires students to stnicturc and apply 
information, and thereby helps to engage smdents 
in this type of learning. 



Curriculum-based testing. Performance assess- 
ments should be based on the curriculum rather than 
constructed by someone unfamiliar with the particu- 
lar state, district or school curriculum. This allows 
the curriculum to "drive'' the test, rather than be 
encumbered by testing requirements that disrupt 
instruction, as is often the case. Research shows that 
most teachers shape their teaching in a variety of 
ways to meet the requirements of tests. Primarily 
because of this impact of testing on instruction, 
many practitioners favor test leform and the new 
performance assessments. 

Worthwhile tasks. Pferformance tasks should be 
* VOTth teaching to"; that is, the tasks need to present 
interesting possibilities for applying an array of 
curriculum-related knowledge and skills. The best 
performance tasks are inherently instmctional, ac- 
tively engaging students in worthwhile learning ac- 
tivities. Students may be encouraged by them to 
search out additional information or try different 
approaches, and in some situations, to woric in 
teams. 

What does it cost? These positive features of 
performance assessment come at a price. Perfor- 
mance assessment requires a greater expense of 
time, planning and thought from snidents and teach- 
ers. One teacher reports, "We can't just march 
through the curriculum anymore. It*shard. I spend 
more time planning and more time coaching. At 
first, my students just wanted to be told what to do. 
I had to help them to start thinking." 

Users also need to pay close attention to technical 
and equity issues to ensure that the assessments are 
fair to ail students. This is all the more important as 
there has been very little research and development 
on performance assessment in the environment of a 
high stakes accountability system, where adminis- 
trative and resource decisions are affected by mea- 
sures of student performance. 

What are examples of successful strate- 
gies and programs? 

I Charlotte Haguchi is a third- and fourth-grade 
teacher at Farmdale Elementary School in Los 
Angeles. Regarding assessment and instruction 
as inseparable aspects of teaching, Ms. Haguchi 
uses a wide array of assessment strategies to 
determine how well her students are doing and to 
make instructional decisions. She uses system- 
atic rating procedures, keeps records of student 
performances on tasks, and actively involves stu- 
dents in keeping journals and evaluating their 
^ own work. Ms. Haguchi can be seen in action 
gp^Q along with other experts and practitioners in the 



videotape Alternatives for Measuring Perfor- 
mance by NCREL and CRESST. (See Jeri 
Nowakowski and Ron Dietel, below.) 

I William Symons is the superintendent of Alcoa 
City Schools in Alcoa, Tennessee. Seeking 
higher, more meaningful student standards 
through curriculum reform, Dr. Symons works 
with school staff and the community to create a 
new curriculum focused on standards and an as- 
sessment linked to the curriculum. Comments 
and advice from Dr. Symons and other practitio- 
ners and experts are available on the audiotape 
Conversations About Authentic Assessment by 
Appalachia Educational Laboratory. (See Helen 
Saunders, below.) 

I Ross Brewer is the director of planning and policy 
development in the Vermont Department of Edu- 
cation. Vermont is assessing fourth- and eighth- 
grade students in writing and mathematics using 
three methods: a portfolio, a "best piece" from 
the portfolio, and a set of performance tasks. 
Other states that have been very active in devel- 
oping and implementing performance assess- 
ments include: California, Arizona, Maryland, 
New York, Connecticut, and Kentucky. (See Ed 
Roeber and state officers, below.) 

Where can I get more information? 

W. Ross Brewer 

Planning and Policy Department 

Vermont Department of Education 

Montpelier,VT 05602 

(802)828-3135 

Carolyn D. Byrne 

Division of Educational Testing 

New York State Education Department 

Room 770 EB A 

Albany, NY 12234 

(518)474^5902 

Dale Carlson 

California Department of Education 
721 Capitol Mall 
Sacramento, CA 95814 
(916)657-3011 

Don Chambers 

National Center for Research in Mathematical 

Sciences Education 
University of Wisconsin at Madison 
1025 West Johnson Street 

Madison, WI 53706 55 
(608)26S-4285 



Ron Dietel 

National Center for Research on Evaluation, 

Standards, and Student Testing 
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An Open-Ended Exercise in Mathematics: A Twelfth Grade Student's Performance 



Look" at the»B plaiWJIguresi*«6me of w aire not drawn "to scale. Investigate what'rnight "~ 

— be wrong (If anything) with the given information. Briefly write your findings and Justify 

-J —your Ideas on the basis of geometric principles. 
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Reprinted, by permission, from A Question of Thinking: A First Look at Students' Performance on Open-ended Questions in 
Malhematics, copyright 1989, Caiifomia Department of Education, P.O. Box 271 , Sacramento, CA 95812-0271 . 
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A Look at a Middle School Portfolio'' 



Arts PROPEL: A Handbook for fTsualArts 

Arts PROPEL is an approach to education that has evolved in the visual arts, music, 
and imaginative writing at the middle and high school levels. The project grew out of a commit- 
ment to develop non-traditional models of assessment appropriate for students engaged in 
artistic processes. Its larger goal is to find means to enhance and dociunent student learning in 
the arts and humanities. Supported by the Arts and Humanities Division of the Rockefeller 
Foundation, PROPEL was developed and field-tested during a five-year period from 1986- 
1991 by researchers at Harvard Project Zero and Educational Testing Service working in close 
collaboration with teachers and administrators in the Pittsburgh public school system. 



For information on Arts PROPEL, contact: 

Drew Gitomer 
Mail Stop 18-R 
Educational Testing Service 
Rosedale Road 
Princeton, NJ 08541 
609-734-1528 

For publications, contact: 

Marilyn Ispanky 
Mail Stop 37-B 
Educational Testing Service 
Rosedale Road 
Princeton, NJ 08541 
609-734-5073 



These materials were reproduced with the permission of Educational Testing Serrice. 



59 



ARTS PROPEL: 

A HANDBOOK FOR VISUAL ARTS 




This handbook was prepared by Allison Facte, Drew Gitomer, Linda Melamed, Elizabeth 
Rosenblatt, Seymour Simmons, Alice Sims-Gunzenhauser, and Ellen Winner, with the help 
of teachers and administrators from the Pittsburgh Public School System. 

Arts PROPEL Handbook Series Editor: Ellen Winner 

This handbook was co-edited by Ellen Winner and Seymour Simmons. 
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A LOOK AT A MIDDLE SCHOOL PORTFOUO 



Ja^eUeHirschkopf was an 8th-grader in PamCostanza's class at Rogers School for 
the Creative and Performing Arts in Pittsburgh. Pam Costanza teaches in what might be 
considered an ideal environment for developing a PROPEL approach to art education She 
IS surrounded by supportive colleagues and adnunistrators who share her beUef about the 
importance of art m education and who, in many cases, are similarly involved in PROPEL 
The students in her classes are also unusual in that, by the time they enter the sixth grade 
they have made a serious commitment to the study of art. They attend art classes during' 
their entire three years at Rogers. Sixth graders take art for a 40 minute period two to four 
tunes per week. Seventh and eighth graders attend classes four days a week, and spend 
three consecutive periods, or two hours and fifteen minutes each day, in art. Seventh and 
eighth graders alternate regularly throughout the year between classes in two-dimensiond 
media taught by Pam and classes in ceramics taught by an adjunct teacher. 

J^e portfoUo shown here begins with two of the many sketches Janelle did as 
weekly homework assignments. Figure 7.5 shows her first sketch, a drawing of a porcelain 
figunne. The figure is rich in detaU and challenging in terms of its proportions and surface 
qualities. In later drawings, Janelle continued to focus on rendering details and surface 
qualities, but now concerned herself with more subtle issues of texture and tone as she 
began to "blow-up" small objects to several times their natural size (see Figure 7 6) 





Figure 7.5 



Figure 7.6 
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When asked in her end-of-term self evaluation which works she felt were 
particularly successful, Janelle said: 

Pencil drawings are what I'm the most successful loith. Vve been 
using pencils basically all my life. I can really create some great stuff 
with it when I want to because I know what Vm doing when it comes 
to that kind of medium. 

She felt that her most frustrating experience was a watercolor assignment done around the 
same time as Figure 7.6. The frustration was due to feeling that she was unable to control 
the mediiun. 

The first in-dass two-dimensional domain project was a portrait imit Building on 
portrait drawing experiences from previous years, this project was structured to help" 
students learn to do portraits iising a variety of styles and media. Students started with a 
series of three-minute line drawings, using felt tip*pens and drew the person across the 
table from them. Students made a blind contour drawing, a drawing using only circular 
lines, and a drawing using oxUy straight lines made with the help of a ruler (see Figures 7.7, 
7.8 and 7.9). 




Figure 7.7 



Figure 7.8 




Figure 7.9 



Figure 7.10 
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For each drawing, students made entries in their journals about how they felt about 
the drawing and the assignment. After all three drawings were done, they were put on the 
board to critique and discuss. A fourth drawing was then begun, initiaUy using the ruler 
again to define the contours, then using oU pastels to paint in the face with "expressionist" 
tints that portrayed the person's personality. (Figure 7.10). 

Tne completed oil pastels were put on the board for further discussion. At the end 
of the project, students looked at a series of Cubist and Expressionist portraits by Picasso, 
and were asked to discuss their portraits in comparison to those by Picasso. 

The portrait project resumed, after several weeks spent on another assignment, with 
portrait drawings done in conte crayon and self-portraits done in pencil (see Figures 7.11 
and 7.12). These were also critiqued in class. Before starting a final portrait drawing, 
students looked at and discussed portraits by artists with very different styles: BotticelU, 
Filippino Lippi, Vermeer, Rembrandt, Cassatt, Van Gogh, ModigUani, and Picasso. 
Students selected one or more artists as models, and aeated a portrait or self-portrait using 
the model as inspiration. Students could choose any of the media they had used during the 
project. As they began work, students recorded in their journals the objectives of the 
project, their chosen artist(s), subject, and media. These entries were accompanied by an 
explanation for the various choices. 
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Janelle wrote as follows: 



/ am going to do a self-portrait in colored pencil... because I want to 
learn haw to use color to make portraits more interesting. The reason 
why Tm doing myself is hecause,if I vxmt to express mysdf in any 
certain way by reflecting my personality, I know more about me, than I 
know about other people in this room. 

The style will resemble BotticdlVs technique because I like how he 
makes his models pose. I had to change my medium to pastels. It is 
going to he more of a challenge but I guess it roill make this project 
more exciting. Ym doing it from shoulder up so you can focus your 
attention mainly on the face, and since Vm starting a new medium it 
would hold me hack to worry about the body on top of a new 
medium...ln the background Ym going to put in deep blue sky zoith lots 
of clouds simply because Ym a daydreamer. I love to just sit around 
doing nothing but I am thinking and dreaming. 





Figure 7.11 



Figure 7.U 



After doing the portrait, students o-itique their work in light of the assignment and 
their intentions (see Figure 7.13). These comments are summarized in the final portfolio 
review form. 
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Figure 7.13 

Although her teacher was plv?ased with the final self-portrait (above), Janelle was not. 
When asked, in her final portfolio review, to select a work she felt was not completely 
satisfying, she chose this one, saying: 

/ don't think it is expressive enough. The main reason why I picked a 
self-portrait is because I waniJi it to reflect me. I thought I would be 
able to show myself through art but I didn't do too well. 




A series of diverse activities foUowed the portrait unit during the second half of the 
year. Among these were a watercolor unit, a Native American project, and a project done 
in conjunction with a social studies unit on "immigration and integration." For tl\e last 
project, Janelle worked in coUaboration with another student on a large pencil drawing 
showing immigrants wearing the costumes of many cultures arriving at the port in New 
York (Figure 7.14). 

The final project for the year was determined by each student individually, the 
pnmary requirement being that it draw upon material and concepts taught during ti\e 
year. For this assignment JaneUe did another large drawing, this time, in pen and ink. The 
tide of the drawing was "Family." (Figure 7.15) 

Explaining her tide, "Family," JaneUe said: 

Because that's what it is. / didn't want to say "the funeral" because 
then people won't he able to make up their own story to it. So with, 
say, just "family" you can come up with many different opinions and 
stories. 




Figure 7.15 



Commenting on her weaknesses, Janelle writes: 



Everything blends in with other things around it. I'm taking too long. 
I can't get all of the shadows to make sense. 



About her strengths, she writes: 



The people look like people except the girl in the chair has a beard. The 
stained glass windows' shades all look right. I managed to get 
patterns done easily, like on the window seat, and the wallpaper. 
Usually everything would be different, but these all are the same. 
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JANELLE'S JOURNAL 



In Janelle's journal are her drawings, along with her comments, and observations, 
reflecting her present and future concerns as an artist. She also included pictures and 
writings that inspired her, such as cut out images of faces from magazines, antique post- 
cards and photographs inherited from an elderly neighbor, and lengthy articles on 
portraiture and costumes apparency xeroxed from an encyclopedia. There are also 
extensive drawings and quotes taken from a book on anatomy. Although many of these 
images were eventually used for dass projects, they were initially selected just because they 
intrigued Janelle. 

Other drawings in the journal include sketches and cartoons which show a lighter 
and freer side of the student artist than one might expect looking at her class work (see 
Figure 7.16 (cat) and Figure 7.17 (cartoon). But also evident is a deep sensitivity to art and 
experience, captured in reflections on her work and in observations such as those dted in 
the box on the next page. 

Pam collects the journals once each grading period and writes extensive comments, 
often in response to the student's observations. 
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Figure 7.16 
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A PAGE FROMJANELLE'S JOURNAL 

October 23, 1990. Sometiines, I don't think I'll ever fuUy 
understand what Art is. There is so much of it and it comes in 
so many different forms and appearances. Lots of it is beautiful 
and lots of it is ugly. Many different feelings come with it, from 
graceful to dumsy, boring to shocking. 

Art is everywhere and can be created from anything. 
Whenever I am riding through the dty at night, I see all the 
buildings shining gloriously agaii\st the dark sky, the red tail 
lights glowing in front of me, and the passing shadows that are 
reflected from telephone poles, abandoned stores and other 
objects that line the dty streets, I think to myself "that v.^^ould be 
one of the coolist pictures in the world to be captured on 
canvas." Then there is the late afternoon scene. Maybe it just 
stopped raining and the appearance of everything has been 
grayed and misted. Bright warm mornings in the sxraimer. 
Cold, black winter nights when the stars appear so sharp they 
cut through the sky. 

Everything gives you different feelings and thoughts. 
Thaf s why I like to draw people so much. Everyone has its own 
interesting personality and identity. Old people, young people, 
black, white, short, fat, tall and skinny. I hope that soon I will 
become a good enough artist to capture every intricate detail on 
my models. I hope to give them personality, feelings, moods, 
and appearances all with my pendL But it will take a lot of 
hard practice. 



In response, Pam wrote: 

These are beautiful and sensitive observations. ''What 
art is'' is a question that changes constantly through 
the years. That's why it's a good idea to record each 
year what you think it is and how it changes from time 
to time." 
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Assessment of Janelle's Portfolio 



Pam provided the following mid-semester evaluation of Janelle's portfolio: 

Productiont When comparing these two sketches (referring to Figures 7.5 and 
7.6), one can see how Janelle's drawing skills have improved. The sketch 
done in September was of a porcelain figurine. It is meticulous, detailed, and 
richly textured The size of ttie sketch was the same as the object itself . The 
January sketch is a more refined rendering of a pin, an object much smaller 
than tihe drawing. Janelle has captured the smooth ''buffed" metal and 
"highlighted" the reflective part of the eyes, nose, lips, and end of the moon 
which were highly polished, demonstrating a greater sophistication. 

Reflection: In Janelle's final portrait (Figure 7.13), she wanted it to express her 
personality. Although she did a beautiful rendering, she was not pleased; it 
did not accomplish her goals. There are many changes she would make that 
would express her personality more accurately. 

The following are some of Janelle's comments about the changes 
she would make which were excerpted firom a Portfolio 
interview with Pam: 

'7 didn't put enough fedings into it and I think I need to change the 
background because Vd put more feelings into it ...I could change the 
pose a little bit . the ^in's too pale. It just looks possessed. 
Everything stands out but the skin ... I think Yd zvant to put more of 
my body in it so I can have it show more of me, other than just my 
head. Maybe a different face expression ...Yd probably smile or 
something. 

The reason why I put the background in there is because I do daydream 
a lot, but I don't do that all the time. I have soTnething more wild and 
something more alive than just sitting there. I loouldn't put shapes 
because everybody puts shapes to express themselves. Yd have some 
fcimi of weird scenery . . . and Yd want people in the background. 

Izvouldn'tmakeitsostill. It just sits there and looks at you. It 
doesn't have anything to it. Yd make it more alive, put more colors 
into it. Make it more colorful and bright ...Yd have my hair flying 
around everywhere." 

Perception: I think the body of Janelle's work thus far has shown a keen sense of 
perception of her environment, from the subjects she chose for weekly 
sketches, to her self-p)ortrait 



Multiple Challenges 



A Series of Questions Illustrating the 1992 
National Assessment of Educational Progress 



This packet illustrates the diversity of the questions on NAEP's 1992 mathematics, 
reading, and writing assessments. Over more than 20 years, NAEP has consistently used con- 
structed-response questions to augment the multiple-choice format — including hands-on tasks 
in science and mathematics; literacy tasks involving newspapers, charts and tables, bus sched- 
ules, and pay stubs; interviews in reading and lengthy writing tasks- 



For more information on NAEP, write to: 



PO Box 6710 

Educational Testing Service 
Princeton, New Jersey 08541-6710 



ov to: 



Education Information Branch 

Office of Educational Research and Improvement 

U.S. Department of Education 

555 New Jersey Ave., NW 

Washington, DC 20208-5641 
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Multiple 
Challenges 

A series of questions illustrating the 
1992 national Assessment of 
Educational Progress 



Multiple 
Challenges 



This packet illustrates the diversity of the questions on PiAEP's 
1 992 mathematics, reading, and writing assessments. We hope 
that teachers will find it informative as they consider perfor- 
mance assessment and develop their own classroom tests. As 
with the entire liAEP, the sample questions presented herein 
were developed with the thoughtful assistance of teaching pro- 
fessionals, 

iMAEP is designed to present students with an array of innovative 
tasks that make learning come to life. The range of challenges 
on the assessment parallels the complexity of the future. Stu- 
dents will be contemplating and adapting to new environmen- 
tal conditions and mastering mathematical and technological 
concepts and skills. They'll be reading, discussing, and elabo- 
rating on people and events in literature. They'll be organizing 
and synthesizing volumtrs of information and using a variety of 
informative and persuasive techniques to communicate. 

During its 22-year history, riAEP has consistently used con- 
structed-response questions to augment the multiple-choice 
format — including hands-on tasks in science and mathematics; 
literacy tasks involving newspapers, charts and tables, bus 
schedules, and pay stubs; interviews in reading, and lengthy 
writing tasks. The students invited to participate in the 1992 
assessment will be asked to address a variety of thought-provok- 
ing questions — giving policymakers and the general public 
useful information on what American schoolchildren know and 
can do. 

Since the framework for riAEP assessments is periodically up- 
dated to reflect current thinking in each field, PiAEP's objectives 
and reports are a valuable resource for teachers developing 
instructional activities that are both innovative and intellectually 
challenging. The 1992 riAEP (in mathematics, reading, and 
writing) includes diverse performance tasks designed to exam- 
ine student achievement in grades four, eight, and twelve. 
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Mathematics 



The riational Council of 
Teachers of Mathematics has 
set standards for mathemat- 
ics education that suggest 
students should be able to 
use mathematics as a way to 
solve practical problems, to 
communicate mathematical 
ideas to others, and to rea- 
son properly. Using technol- 
ogy in the classroom can 
accelerate the pace of stu- 
dent learning and help make 
school mathematics more 
like the mathematics people 
use in their everyday lives 
and on the job. 

The riAEP mathematics as- 
sessment — which requires 
students to use calculators, 
rulers, and protractors — 
contains questions that 
direct students to sketch, 
measure, and manipulate 
geometric figures; to repre- 
sent algebraic equations 
graphically; and to give brief 
written explanations to sup- 
port solutions to problems. 
The framework for the as- 
sessment is organized ac- 
cording to mathematical 
abilities and content areas. 
The mathematical abilities 
assessed are conceptual 
understanding, procedural 
knowledge, and problem 
solving. The content areas 
assessed are Numbers and 
Operations; Measurement; 
Geometry; Data Analysis, 
Statistics, and Probability; 



and Algebra and Func- 
tions. 

The constructed-response 
questions provide an ex- 
tended view of students' 
mathematical abilities that 
cannot be measured using 
multiple-choice questions. 
These include the ability to 
articulate mathematical 



ideas, make estimates, de- 
velop informal proofs, draw 
figures, and generalize rela- 
tionships. 

The following are exa»^ples 
of geometry tasks: number 1 
for fourth grade; numbers 2, 
5, and 4 for eighth and 
twelfth grade; and number 5 
for twelfth grade. 




(The students would be given the 
cut-out cardboard triangle drawn 
above and a straightedge.) 

1. Piece A r(^presents what geometric figure (shape)? 

lsoscje.les 4-rianQl£.. isosc£.les riqM-^fianqle^ 

2. Name three geometric properties of piece A. 

ID- IholS i ec^uouL Sides CL q^j^joL anq)es\ 

3. Use piece A to show that the two figures below have 
equal areas. You may use drawings, words, and 
numbers in your e xplanation, O r> 

one Pi^u/^ f 
"pieces 4lie, c^-il^er Ccjure^ 5o a/gas qj<l ecjuoj . 
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. Use piece A to prove thi'i the area of the figure 
below can be found by using the formula 
A =72 h (b+c). You may use drawings, words, and 
numbers in your explanation. 

b 




4. 



If the area of piece A is 3, what is the area of 
the figure below? Explain how you found your 
answer. You may use drawings, words, and 
numbers in your explanation. 




IS rtvidLe__U|> (cor ideniiccjJL -Wi a/\qks 
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Beadin 




Recognizing that readers involve 
themselves differently with 
different material, depending on 
the Kind of text and the purpose 
for reading, the 1992 riAEP 
reading assessment uses a 
variety of passages to assess 
specific kinds of reading. riAEP 
examines performance in three 
basic reading situations — 
reading for literary experience, 
reading to obtain information, 
and reading to perform a task. 
For example, we read novels for 
literary experience, newspapers 
for information, and instruc- 
tional manuals to accomplish 
tasks. As readers' purposes 
change, so do their approaches 
to the material, even when 
using the same text. 

The assessment allows students 
either 25 or 50 minutes to read 
and respond to questions from 
reading material drawn from 
published sources. Within each 
of the following situations, 
certain reading abilities are 
assessed - every reader needs 
to draw on them to read effec- 
tively. 

• Building an understanding 

of a passage includes forming 
an initial, global understanding 
of it. To assess this understand- 
ing, riAEP asks constructed- 
response questions like "What 
does this passage say to you?" 
or "What does the author think 
about the topic discussed in the 
paragraph?" Usually, the first 



question taps the reader's initial 
impression of the text; follow-up 
questions deal with a more in- 
depth understanding. 

• Developing an interpreta- 
tion goes beyond the initial 
impression to seek more com- 
plete understanding. Assess- 
ment questions that measure 
this ability include: "Mow does 
this character change from the 
beginning to the end of the 
story?" or "What caused the little 
giri to get angry?" 

• Personal reflection and 
response requires the student 
to relate the text to personal 
knowledge or experiences. 
Questions that require personal 
reflection and response include 
"Mow is this story like or differ- 
ent from your experiences?" or 
"What additional information 
would you like to know about 
this topic?" 

• [>enion$trating a critical 
stance requires the student to 
consider the text objectively. It 



involves evaluating, comparing 
and contrasting, applying knowl- 
edge, and understanding text 
features such as the author's 
use of irony and the organiza- 
tion of ideas. Some critical 
stance questions require read- 
ers to make connections across 
parts of a text or between texts. 
Questions that require a critical 
stance include "What could be 
added to improve the author's 
argument?" or "Poem A and 
Poem B have similar themes, 
but how are they different?" 

In 1992, riAEP is conducting a 
special study of students' oral 
reading skills called the Inte- 
grated Reading Performance 
Record (IRPR) - for a descrip- 
tion, see Special Studies in 
Reading & Writing in this 
booklet. 

The following are examples of 
an eighth-grade reading task to 
assess informational experience 
and a fourth-grade reading task 
to assess literary experience. 
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The Constitution 
of the United States 

There 15 a buildiag in Wiihiagajn, D.C, called die Nariocal Archives, where our 
Kojt valuable hiitoric documcnc and materialj are rored Jouraab and newspipcri, 
pcrsonai letters, photoj that date back to die time when photography ww mveoied — 
riicse are some of die Archives* important ccaiDres. Bat diere are two documcacj ia the 
Archives that hold a place of spedai honor, Hoojed in a display area that is mclf 
something of a marveJ, these nro docnraeac attract dxmsaads of visitors every year. 
They ire our Dcciaradon of Indcpccdeoce and the Consotadon of the United States. 

The Dedantion of lodepeadeace, fiided and barely xcadable, and the Consdcnoon, 
in excellent condidon after almost two hcndred ycaix, anraci visiton no: becime they 
are old and bcautifol to look ac, but because they represent the basic ideas that 
Amcticani lire by. When risiton have the oppottnnity to view the actual documcacj, 
ccany fed in awe of what die documeno stand for. They were written a: a rime when 
Americans were itruggiing to «3te a new and lasting gOTemment, Men and women — 
iadudiag John and Abigail Adanu, Thomas Jeffenon, Alexander HamUron, Ben 
Franklin, and James and Dolley Madison — were studying, debating, and presenting 
argomeao for different kinds of governments. The basic ideas diat affect ho«* wc live 
today were set down from try^ *'bcD die Dedaxarion of Indepeodeace was issued, to 
ryM when the CoonitudoQ was ratiiied. 

The Stage Is Set For the Convention 

The period following America's War of Independence was a difficult time for our 
country- During d-.e war, the Condneatal Congress, which was the assembly of people 
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On the bas^s of what you know from this article 
and your personal knowledge, explain whether the 
Articles of Confederation helped or hindered 
progress toward a new form of government. 



Think about the plan to '*fix'' Anthony. Write a 
paragraph explaining whethei* or not this plan is a 
good idea. Give examples from the poem. 
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Writing 



The writing assessment re- 
quires each student to write 
either two 25-minute essays or 
one 50-minute essay. Topics 
range from informative to 
persuasive to narrative and ask 
students to generate ideas, 
synthesize information, and 
organize their thoughts. Each 
task is accompanied by a "plan- 
ning page" for students to use 
in jotting down ideas, diagram- 
ming, or outlining their 
thoughts. MAEP then evaluates 
the quality and fluency of these 
responses and monitors the 
trends in students' writing 
performance across the years. 
Students' responses are evalu- 
ated on the basis of their suc- 
cess in accomplishing the 
specific purpose of each writing 
task, as measured by an en- 



hanced application of "primary 
trait" scoring. Based on a six- 
point scale, the evaluation 
criteria measure students' 
success in selecting, organiz- 
ing, and presenting relevant 
information as well as their use 
of effective organizational and 
development strategies (e.g., 
compare, contrast, and anec- 
dote). 

In 1992, MAEP is conducting a 
special study of classroom- 
based writing called The 
Nation's Writing Portfolio - 

for a description, see Special 
Studies in Reading & Writing in 
this booklet. 

The following are examples of 
eighth- and twelfth-grade writ- 
ing tasks, respectively. 



Think about a favorite siory you have read or 
heard, or one that you have seen m the movies or on 
television. Write a paper, telling what the story is about 
and why you like it. Help other people to understand why 
you think that it is a good story. Use examples from the 
story, such as details about characters, places, events, or 
ideas. 

Write your paper on the lined pages. 
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Your school has decided to employ students to fill 
several new part-time jobs at school. You want to gain 
more work experience and the pay is attractive, so you 
have decided to apply for one of these jobs. To be 
considered, you need to submit a letter of appllc&tion. 
identifying the job you want and summarizing your 
qualifications for it. Uany students will apply for each 
job, so you need to convince the employment director that 
you are the best candidate. 

Many positions are available. For example, the school 
will hire students to paint classrooms, do library work, 
fix up recreational areas and equipment, help maintain a 
computer system, make props and costumes for special 
events, and help teachers prepare materials for their 
classes. Students can also suggest Jobs that they think 
are needed at school. Students can perform these Jobs 
either during the school year or during the summer. 

According to the employment director, your applica- 
tion letter should include the following elements: 



• a short description of the job you would like 
to have at school; 

- a summary of the skills that you thmk are 
needed to do this job well; and 

- personal information on previous work 
experience, job-related interests, future 
employment or educational plans, and other 
relevant details. 
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Special Studies in 
Reading 8c Writing 



As teachers devote more instruc- 
tion time to presenting students 
with multiple challenges, PiAEP 
continues to devise alternative 
methods of assessing how well 
students are meeting these 
challenges. 

The 1992 NAEP assessments 
provide two special components 
that will extend and enhance the 
information from the main 
reading and writing assessments. 

The Integrated Reading Per- 
formance Record (IRPR), a 

special study to augment the 
1992 paper-and-pencil reading 
assessment, will yield detailed 
infomiation on students' oral 
reading fluency and comprehen- 
sion. 

A representative subsample of 
fourth graders participating in the 
main MAEP reading assessment 
will be chosen to take part in 
audiotaped interviews conducted 
by trained administrators. 

Students are asked to bring to 
the interview their current read- 
ing textbook, three samples of 
work completed for reading 
class, and a favorite book. They 
discuss with the interviewer their 
independent reading experi- 
ences and samples of their 
classroom work. Students also 
reread one of the passages they 
responded to in the written 
portion of the assessment and 
orally answer questions about it. 



While the student reads aloud, 
the administrator notes miscues, 
reading time, and the student's 
phrasing of text. 

The IRPR, together with the 
paper-and-peiicil assessment, 
provides a more in depth view of 
reading ability by allowing stu- 
dents several opportunities to 
demonstrate achievement. 

The Nation's Writing Portfolio 

is designed to collect students' 
best papers. This study of class- 
room-based writing also de- 
scribes the assignments typically 
given to students and the broad 
range of procedures and strate- 
gies students use to complete 
the tasks. 

Through the portfolio assess- 
ment, students can demonstrate 
their skill ac writing when they 
have the opportunity to edit and 
revise their work. HAEP also can 
evaluate the relationship be- 
tween writing produced in the 
classroom and that produced 
under timed conditions. 

Teachers, working with fourth 
and eighth graders who partici- 
pate in the main HAEP writing 
assessment, will review and 
select three papers that best 
illustrate each student's achieve- 
ment as a writer. 

The papers will represent a range 
of types of writing tasks (stories, 
reports, essays, and persuasive 



pieces) and the use of writing 
process strategies (successive 
drafts, use of reference sources, 
and peer review). 

Along with their papers, students 
will also attach a brief explana- 
tion of their choice of papers to 
be submitted. Teachers will fill 
out a short questionnaire de- 
scribing the assignments and the 
instruction that led to each 
student's writing. 

Analyzing the Portfolios 

Each portfolio will undergo a 
three part analysis. First, a 
trained reader completes a 
descriptive coding sheet noting 
the type and form of writing, the 
audience, the number of words, 
evidence of revising, and other 
information. Second, the reader 
evaluates pieces classified as 
narrative, informative, or persua- 
sive, according to criteria estab- 
lished by teachers. Papers are 
rated from I to 6. Then, using 
the teacher questionnaire and 
student letter, the reader synthe- 
sizes the information about the 
assignments and contexts that 
produced the pieces of writing. 

For narrative wriUng, a paper 
rated as I contains a list of 
sentences minimally related - 
while a rating of 2 is given to 
brief papers with a few details 
about settings, characters, or 
events. A 5 rating is given to a 
paper that describes a series of 
events but lacks cohesion due to 
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problems with syntax, sequenc- 
ing, missing events, or an unde- 
veloped ending. While a pap>er 
rated as 4 descrit>es a sequence 
of episodes, including details 
about most story elements, it 
nonetheless is confusing or 
incomplete. A rating of 5 is given 
to a paper that, while describing 
a sequence of episodes in which 
almost all story elements are 
clearly developed, may have one 
or two problems or include too 
much detail. A rating of 6 repre- 
sents a paper with a well-devel- 
oped description of episodes, 
elaborated resolution of goals or 
problems, and cohesive presen- 
tation of events. 

For informative writing, a 

paper rated as 1 lists pieces of 
information or ideas all on the 
same topic, but does not relate 
them, while a 2 rating represents 
a paper that relates a broader 
range of infonnation without 
clearly establishing ideas, expla- 
nations, and details. A paper 
rated as 3 includes a broad range 
of ideas with some established 
relationships, while a 4 rating 
signals clearly related infomnation 
and use of rhetorical devices. A 



rating of 5 is given to papers that 
present well-developed informa- 
tion and relationships with 
explanations and supporting 
details, while a paper rated as 6 
has an overt organizational 
stmcture, a coherent sense of 
purpose and audience, and is 
free from grammatical problems. 

Fw persuasive writing, a paper 
rated as 1 states an opinion but 
does not supp)ort it, while a 2 
rating is given to papers that 
support an opinion but fail to 
explain the reasons. A paper 
rated as 3 attempts to develop 
reasons for the opinion with 



explanations that are not devel- 
oped or elaborated, while a 
paper rated as 4 supports the 
reasons with explanations devel- 
oped through the use of rhetori- 
cal devices. A 5 rating is given to 
a paper that contains an oppos- 
ing point of view and an attempt 
to discuss and/or refute it, while 
a paper rated as 6 summarizes 
opposite points of view and 
cleariy and explicitly refutes 
them. 

Further information about the 
scoring guides for the national 
Assessment can be obtained 
from the project staff. 
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Learning by Doing 



A Manual for Teaching and Assessing Higher-Order Thinking 
in Science and Mathematics 

Intended for use by science and mathematics coordinators and teachers, this manual 
presents 11 tasks field-tested by NAEP during a pilot study in 1986. The tasks were selected to 
show a range of possibiUties for both classroom and assessment use. Many of the ideas underly- 
ing the hands-on tasks can be adapted to a variety of different mathematics and science con- 
cepts. Each task is identified by the thinking skills necessary for successful student perfor- 
mance and the administrative mode used by NAEP. The presentation for each task includes a 
brief description of the activity, the student response sheet, a Ust of the equipment used, and 
one or more exemplary student responses. 

Learning by Doing is adapted from A Pilot Study of Higher-Order Thinking Skills Assessment 
Techniques in Science and Mathematics, Final Report, This two-volume 537~page report, 
which describes NAEP's project in detail and presents all 30 tasks included in the pilot study 
— six group activities, 20 station activities, and four complete experiments — is available for 
$35 plus shipping and handling from: 

NAEP 
CN6710 

Educational Testing Service 
Princeton, NJ 08541-6710 
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NAEP's 1990 Writing Portfolio Study 



Approximately 4,000 students who participated in the 1990 NAEP writing assessment, 
half in grade 4 and half in grade 8, were invited to participate in a special study using portfo- 
Uos. We have reproduced the Table of Contents to this 188-page report, ^he Introduction, and 
"Examples of the Narrative Scoring Guide" from Chapter 2, ''Evaluating the Writing." 

The full report, authored by Claudia Gentile, is titled Exploring New Methods for 
Collecting Students 'School-based Writings and was issued in April 1992 by Educational Test- 
ing Service under contract with the National Center for Education Statistics. 

For ordering information on this report, write: 

Education Information Branch 

Office of Educational Research and Improvement 

U.S. Department of Education 

555 New Jersey Ave., NW 

Washington, DC 20208-5641 

or call 1-800-424-1616 (in the Washington, DC metropoUtan area call 202-219-1651). 



ERIC 



140 



119 



NATIONAL CENTER FOR EDUCATION STATISTICS 




Ei^loring New Methods 
for Collecting Students' 
School-based Writing 

NAEP's 1990 Portfolio Study 



CLAUDIA GENTILE 



APRIL 1992 



THE nmows 

REPORT 
CARD 



m. 




Prepared by Educitional Testing Service for the National Assessment of Educational Progress 1990 Writing Assessment 
under contract with the National Center for Eduation Statistics. Office of Educational Research arwl lmprovef7>€nt 
C.S. Department of Education 

120 

Por sale by the U.S. Government Prinung Office 
Supefinicndent of Documents. Mail Stop: SSOP. Washington. DC 20402-9328 
ISBN 0-16-036ia9~3 



T^le of Contents 



INTRODUCTION 2 

Purpose 2 

Collecting Students' Writing 5 

Outline of this Report 7 

CHAPTER ONE Describing the Writing 8 

TVpes of Writing 9 

Audience 10 

Evidence of the Use of Process Strategies 10 

Evidence of the Use of Resources for Writing 11 

Length of Papers and Use of Computers 12 

Types of Activities 12 

Summary 16 

CHAPTER TWO Evaluating the Writing 18 

Developing Evaluative Guides 18 

The Narrative Scoring Guide 20 

The Informative Scoring Guide 21 

The Persuasive Scoring Guide 22 

Applying the Evaluative Guides 23 

Examples of the Narrative Scoring Guide 29 

Examples of the Informative Scoring Guide 45 

Examples of the Persuasive Scoring Guide 52 

Summary of Performance Across Domains 58 

Summary 60 

CHAPTER THREE Comparing Methods of Assessment 62 

Features of the Assessment 62 

Comparing Students' Performance 64 

Lessons Learned 66 

Summary 71 

CHAPTER FOUR Samples of Students' Writing 76 

Parti: Narrative Writing 76 

Part 2: Informative Writing 109 

Part 3: Persuasive Writing 141 

Part 4: Poems 158 

Part 5: Letters 165 

Part 6: Research Reports 171 

APPENDIX A Demographic Characteristics 184 

APPENDIX B Students' Performance by Process Strategies 186 

ACKNOWLEDGMENTS 187 

ERJC 142 



121 



I 



Introduction 



Purpose 

In recent years, teachers nationwide have been using process approaches to 
writing instruction to help students become effective communicators. Many 
students write major texts over extended periods of time, and in many class- 
rooms, writing instruction encompasses a range of interrelated activities that 
engage students in pre-writing activities, drafting, and revision.^ As a part of 
this process, student writers often consult with peers, teachers, and parents. 
The aim of these methods is to enable students to produce richer, more 
developed pieces of writing. 

However, we face a problem when we try to assess the extent to which these 
efforts are successful. Traditional methods of evaluating students' writing (in 
particular, the timed essay test) are designed to measure a specific facet of 
writing ability — how well students can write on an assigned topic under 
timed conditions.^ They are not designed to capture the range and depth of 
the writing processes in which students engage during process writing 
instruction programs.-* 

It is possible to emulate aspects of the process approach to writing within 
the context of traditional writing assessment methods. For example, the time 
allocated for writing can be increased, and can even be held over several days 
to allow for peer review and other classroom activities (e.g.. New Brunswick, 
Canada Reading and Language Arts Multi-day Assessment Program).' How- 
ever, holding an assessment over several days poses operational difficulties, 
increasing the costs and complexity of assessments. 



Janet Emig. The Composing Processes of Twelfth Grarfm. (Urbana IL: National Coun^^^ 

Teachers of English. NCTE Research Report No. 13. ERIC Document No. ED 05820d. 19 d I. 
^ Nancv .\twell. "Making the grade." in Understanding Writing: Ways f^J^f '^j"^^ """^ 

Teaching (2nd edition). Thomas Nev^-kirk and Nancy Atwell. editors. (Portsmouth. NH. 

Heinemann. 1988). ^ ,j , o , 

^ Hunter .M. Breland. Roberta Camp. Robert J. Jones. Margaret M. Morris, and Donald A. Rock. 

Assessing Writing Skill. (New York: College Entrance Exammation Board. WHO. 
« C K Lucas. "Toward ecological evaluation. Part 1." The Quarterly. 10 (1). 1-3. 12-17. 1988. 
' New Brunswick Reading and Unguage Arts Assessment Program. (Ministry of Education. New 

Brunswick. Canada. 1991). 
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Another way of establishing stronger connections between process writing 
curriculums and assessment methods is to adapt an instructional tool — 
writing portfolios — for assessment purposes.^ Recently, schools, districts, 
and states have been exploring ways of using classroom writing portfolios to 
assess students' writing achievements. Using the writing students have pro- 
duced as they engage in process writing programs establishes an immediate 
connection between the assessment and the writing process curriculum." 
Recent efforts to adapt writing portfolios for assessment purposes can be 
classified into three types: the classroom portfolio, the combination portfolio, 
and the assessment portfolio. 

The Classroom Portfolio While Classroom Portfolios differ from 
classroom to classroom, they usually share several key characteristics. Daring 
the school year, as part of their English/language arts classwork, students 
collect their written work in folders. At specific points in the term, they review 
their work and create a portfolio by engaging in a process of reflection, selec- 
tion, and description, (e.g.. New York City Portfolio Project, ARTS Propel).^ 

The reflection and selection stages are guided by a set of criteria devel- 
oped by teachers and/or students, based on the writing curriculum they are 
following.^ These criteria often focus on the depth of student writing (writing 
that demonstrates the use of process strategies and writing that shows growth 
over time) and on the breadth of student writing (writing that illustrates the 
range of activities in which students have engaged). 

Often the students determine how many pieces to include in their port- 
folios, with a minimum of three being common practice. A central element of 
these portfolios is the letters or statements students write explaining their 
selections and how their choices meet the selection criteria. This process of 
reviewing and evaluating one s own writing and then articulating one s deci- 
sion?, is considered central to the portfolio experience because it fosters 
students' development as writers.^^ The classroom teachers assist students 
throughout this process and also evaluate the portfolios. Sometimes other 



S. Murphy and M. A. Smith. "Talking about portfolios." The Quarterly. }2 (2). 1990. 

■ D. Galleher. ".Assessment in context: Toward a national writing project model." The Quarterly. 9. 
(3). 5-7. 1987. 

Robert J. Tierney. Mark A. Carter, and Laura E. Desai. Portfolio .Assessment in the Reading- 
Writing Classroom. (Norwood. MA: Christopher-Gordon Publishers. Inc.. 1991). 

Roberta Camp. "Thinking together about portfolios.** The Quarterly. 12. (2). 8-14. 27. 1990. 
Mary Fowles and Claudia Gentile. Evaluation Report of CUNY Lehman s Writing Across the 
Curriculum Program. (Princeton. NJ: Educational Testing Service. 1989). 

" Denny P. Wolf. "Opening up assessment." Educational Leadership, 45. (4). 24-29. December. 
1987/January. 1988. 

E. Winner and E. Rosenblatt. "Tracking the effects of the portfolio process: What changes and 
when?" Portfolio. I (5). 21-26. 1989. 
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students, friends, and family read and comment on students' portfolios. 
Students may collect portfolios for part of the year, the whole year, or over 
their whole academic careers, for one class or all classes. 

The Combination Portfolio The second type of portfolio assessment 
system uses a combination of approaches to collect writing from students 
(e.g., Vermont Portfolio Project),^- In addition to asking students to assemble 
a portfolio from the work they have collected for their classes, students are 
asked to select a "best piece" and to include in their letter describing their 
portfolio an explanation of what makes this their best effort. Students may 
also be asked to complete a writing activity common to all students in a 
particular class or group. These three components — portfolio, best piece, and 
common piece — are then evaluated individually by one or more teachers and 
evaluative information is presented on each component, resulting in a profile 
of an individual student's writing achievements. Summary statements to 
students about their entire portfolios are also made by their classroom 
teacher, other teachers, and/or other students. 

The Assessment Portfolio The third type of portfolio assessment 
system involves administering several common writing activities to students 
(e.g., Rhode Island Portfolio Project).*^ Committees of teachers design a series 
of multi-day writing activities that reflect their writing curriculum. On the 
same days, using the same administration procedures, the teachers have their 
students engage in these activities. They collect the students' work in folders 
and have the students review their work and write letters explaining which 
activity yielded the best writing and from which they learned the most. A 
committee of teachers then meets to score the students' responses to each 
activity. The result is a profile of each student s achievements relative to the 
common tasks. This type of portfolio differs from traditional essay assessments 
in that the activities are designed to match a specific school's or state's cur- 
riculum and the students' work is accomplished as part of their regular 
classroom activities rather than under standardized assessment conditions. 

The 1990 XAEP Pilot Portfolio Study In keeping with these new 
developments, the National Assessment of Educational Progress (NAEP) has 
begun exploring alternative methods of assessing students* writing achieve- 
ments — methods that focus on the writing students regularly produce as 
part of their classroom activities. NAEP conducted a pilot portfolio study in 



J. Flood and D. Lapp. "Reporting reading progress: .-^ comparison portfolio for parents." The 
Reading Teacher, 42. (7). 508-514. 1989. 

R. P. Mills, "Portfolios capture rich array of student performance." The School Administrator. 8- 
11. 1989. 

Mary Fowles and Claudia Gentile. Validity Study of the 2988 Rhode Island Third-Grade Writing 
.Assessment. (Princeton. NJ: Educational Testing Service. 1989). 
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1990 in order to explore the feasibility of conducting large-scale assessments 
using school-based writing. The main purposes of this pilot study were: (1) to 
explore procedures for collecting classroom-based writing from students 
around the country; (2) to develop methods for describing and classifying the 
variety of writing submitted; and (3) to create general scoring guides that 
could be applied across papers written in response to a variety of prompts or 
activities. 

To this end, a nationally representative subgroup of the fourth and eighth 
graders who participated in NAEP's 1990 writing trend assessment was asked 
to work with their teachers and submit one piece of writing that they consid- 
ered to be a sample of their best writing efforts. The goal was to create a 
"Nation's Portfolio" — a compilation of the best writing produced by fourth 
and eighth graders in classrooms across the country. 

NAEP analyzed and summarized these samples of writing along with teach- 
ers' descriptions of the assignments that produced them. In addition, NAEP 
compared students' school-based writings to their responses on the 1990 
NAEP writing assessment to examine relationships between these two modes 
of assessment. This report describes the procedures used to collect, describe, 
and evaluate the school-based writing in this special pilot study. 

The 1990 writing assessment was a trend assessment — prompts that had 
been developed for the 1984 assessment, and readministered in 1988, were 
also given in 1990 in order to measure changes in students' writing achieve- 
ments across the six-year period. In 1992, NAEP will continue the writing 
trend assessment, as well as conduct a new writing assessment comprised of 
informative, narrative* and persuasive writing prompts developed specifically 
for the 1992 assessment. While the trend writing assessment has not changed 
since 1984, the new 1992 writing assessment reflects recent developments in 
the field of writing instruction and assessment. For example, the time allo- 
cated for writing has been expanded to 25- and 50-minute periods. Also, a 
planning page has been included after each prompt, to encourage students to 
reflect and plan their responses to the topics. The 1992 assessment will also 
include a revised and expanded version of the 1990 pilot portfolio study and 
participants will be selected from among those students taking the new 
regular writing assessment. 

Collecting Students^ Writing 

The Participants Approximately 4,000 students who participated in the 
1990 NAEP writing assessment — 2,000 students at grade 4 and another 2,000 
students at grade 8 — were invited to participate in the special portfolio study. 
Based on traditional NAEP sampling procedures, this group would have been a 
nationally representative sample of the nation's fourth and eighth graders. 
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However, only 55 percent (1,110 students) of the fourth graders and 54 per- 
cent (1,101 students) of the eighth graders and/or their teachers accepted this 
invitation. While these response rates provided enough papers to permit an 
analysis of the writing submitted on a pilot basis, as statistical samples they 
were too small to make generalizations about all of the nation's fourth and 
eighth graders' writing performances. 

While the participants did not represent a national sample of students, they 
were from all of the major geographic regions and from various types of 
communities, including rural, suburban, and inner city. They represented a 
variet}' of racial/ethnic backgrounds as well as a balance between males and 
females (see Appendix A for details on the demographic characteristics of the 
participants). 

Compared with the entire group of students who participated in the 1990 
NAEP writing assessment, the participants of this study differed in some 
respects. Slightly higher percentages of the portfolio pilot study participants: 

were above the modal ages of the sample (ages 9 and 13), 
y-. attended schools in advantaged urban communities, reported 
having higher grades, 

reported having a greater number of reading materials at home, 
and 

/ received slightly higher scores on the NAEP writing assessment • 
tasks. 

When considering the data from this pilot study, it is important to keep in 
mind that the students who participated appear to be somewhat olden higher 
achieving, and more advantaged than the larger population of students 
assessed by NAEP in 1990. 

The Procedures In the spring of 1990. at the time of the NAEP writing 
assessment, the English/language arts teachers of participating students were 
asked to help several of their students choose a sample of their own best 
writing from the work the students had completed so far in the 1989-90 
school year. No more than 10 students from any given class were selected to 
participate. Teachers were asked to encourage their students to choose pieces 
that had involved the use of writing process strategies (such as revising suc- 
cessive drafts, using reference sources, consulting with others about writing). 
NAEP also asked teachers to attach a description of the activities that gener- 
ated the students' writing and to comment on any process strategies the 
students used to produce their writing. 



Teachers then submitted their students' writing to NAEP, along with a copy 
or description of the activities that generated the writing and any available 
drafts or prewriting samples. These pieces were used to create two national 
portfolios or collections of students' classroom writing — one containing the 
writings of fourth graders and the other containing the writings of eighth 
graders. 

Unfortunately, due to the complex procedures NAEP employs to select 
students 'o participate in its assessments, we were unable to inform teachers 
at an ear., .ate which of their students would be participating in this study, 
with some teachers receiving only several days' notice. Thus, for the pilot, 
teachers and students did not have much time to review the students' writing 
and select best pieces. Based on this experience, a procedure for giving 
teachers more advance notice of the upcoming portfolio assessment was 
developed for the 1992 NAEP Portfolio Study. It is hoped that, by giving the 
participating teachers in 1992 several months' notice, the 1992 results will 
be representative. 

Outline of this Report 

This report is divided into four sections. Chapter One describes the writing 
received from the students and information from participating teachers about 
the activities that generated the writing. Chapter Two explains the procedures 
used to evaluate the writing students submitted as well as the results of this 
evaluation. Chapter Three compares the results of the NAEP 1990 writing 
assessment with the analysis of participants' school-based writing samples and 
summarizes the lessons learned from this portfolio study. The last chapter 
contains a set of sampie papers, further illustrating how the evaluative guides 
can be applied and presenting a sense of the range and depth of writing we 
received from participating students. 
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Examples of the Norrative Scoring Guide 

Event Description (score ofl) Papers classified as event descriptions 
tell about one event. Basically, they say, "such and such happened." Some of 
the papers in this category give details about the setting and so appear to be 
more elaborate stories. However, they end with a description of a single event, 
rather than a series of events. The paper below, written by a fourth grader, is 
an example of a simple Event Description. 



Z U ;/ ^v ///// \^/y>y/y^ //V^^^^^ - 



zz: 



hy'y7/i . \ 



Undeveloped Story (score of 2) Papers classified as Undeveloped 
Stories tell about a series of events. Basically, they say, "one day this happened, 
then something else happened, and then another thing happened." However, 
the events, as well as the setting and characters, are only briefly described. The 
writers give very few details about each event: the story is a listing of related 
events. 
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These stories are similar to front-page newspaper reports, where the basic 
facts of a story are reported (who, what, when, where) but few details about 
why events happened are presented For example, in the paper below, the 
fourth-grade writer uses one sentence to describe each event 



jI'\^j\Ju 'i^-::^ "id^r.O ^^-VnJ^, 
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Basic Story (score of 3) in papers classified as Basic Stories, the 
writers go one step beyond a simple listing of related events. One aspect of the 
story (the events, the characters' goals, or the setting) is somewhat developed. 
However, these stories lack a sense of cohesion and completeness. Events may 
be presented out of sequence, some aspect of the story may be confusing due 
to problems with syntax, or a key event may be unclear. For example, in the 
paper below, the fourth-grade writer describes a series of events and, at the 
beginning, develops a problem in some detail (a librarian who puts books away 
too quickly). However, the resolution to the problem, although humorous, is 
not well developed. 




^^^OMX^ 4 Q^dTuSAl .jfU-^ Myxj9uu 

<^^^-CoCk-^v^Pj CLr\r>^ jj^^xi/rrxj ixjjf" 

/y-UL^:^ MjtCxS^ jtA2.AJ— yUUcrcJ^ 
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Extended Story (score of 4) Extended Stories go beyond 5a5/c Stories 
in that many of the events in these stories are elaborated to some degree. This 
degree of development gives a sense of a sequence of distinct story episodes. 
Details are given about the setting, the characters' goals, problems to be 
solved, and the key events. Yet, these stories may be somewhat incomplete in 
that the characters' goals may be left unresolved or the problem posed in the 
story's opening never solved- The ending may not match the beginning or the 
story's ending may be inconsistent with the internal logic established 
throughout the rest of the story. Or, as in the example below (written by an 
eighth grader), they may be very satisfying, yet not elaborately developed- 

It is important to note that, y^hi\t Extended Stories are not as elaborated or 
complex as are Developed Stories and Elaborated Stories, they are successful 
stories — all of the key story elements and events are clearly presented. They 
are the simplest type of complete story on this scale. 




■"^^J^ /fK^^ y^^^. .'njL.ffy^^ucs^^^^ -r:»->-w^ 
. ^„ - ^ . -'jfcijB^ iO^- \y '^A/Ci J .^-^^tskmJ. . 
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Developed Story (score of 5) Developed Stories describe a sequence 
of episodes in which almost all of the events and story elements are some- 
what elaborated. Yet, one aspect of these stories is not well developed, such 
as the ending or a crucial event In the example below (written by an eighth 
grader), each episode is somewhat developed, but could be further elaborated 



0, ^/L\X. iJi^ UuX^^ Jm-Cuu ^^^^^^^^^ 

d^r^L<uJ ^ iL^XL^ ^^^^^^ ^pJi^^c^. AXjl. JULr(yuxJ^ 

jIMjl. JCUA jXctiSx^^ 0^ Xu^ .{J1j^\^- 

Xo-t^ JjuyJ^ Ji^o^^ UitccA dJi-(^ S'ol" O^nA 



/^/C^LA^ .i^uvCtZi^ (X JLo.J-^ (Ld/ryuL^ jJrJZc. ikixju ^nK^v- 
(Lyxj)^.^ CL/fU^ Jcot>\t^ Xjla^ Jn^mju ^ c^liu^ JLdxL^ JU^cr^ 

Jt-^-i^ .A-ej >0>/u. J.\JL^ <^ yX^lu-^^ r 

yi^4^-C_ ...ti^ ^-:aL> ^-^^ tiZ^ • /'^^.l.'', - 

Ck JruJi^ duruj^ --Ldic^ Jclyju /\jL^^>ry^ /X^oLcJi 

^ ^^A^^ x^^^ ^U^(/u^ X-<W<- >L<jur^ Ay-</<^^ 

(L^U.^ A^^l^vtuk ^/)iJ^ J'^^^'^^ djjj^c^ . 
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rtjO'i /Ynj^ ^^<^ JX^C/ULJL^t'X^VtJil 

z;^^^ /LX^>^^c^^ ./^ ^li^^ J-xjut^ 
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Elaborated Story (score of 6) No papers were considered to be Elabo- 
rated Stories. To be classified as elaborated, stories had to present a sequence 
of episodes in which almost all of the events and story elements were well 
developed. Goals or problems introduced in the beginning were well resolved 
by the end, characters' motives were well developed, and the entire story was a 
cohesive, unified whole. 

In the example below, the eighth-grade writer of "The Black Rose" retells 
the plot of a Halloween movie. In it, the writer effectively presents each 
episode, leading to a spine-tingling ending. The only discordant note is the 
occasional switching of narrative voice between first person and third person. 
A revising of this story that included a consistent use of narrative voice would 
make this an example of an Elaborated Story. (As is, this story received a score 
of 5.) 



^€_VWcxvr-.v^ H'.W-^-r. S\rs^ \ocx<^ ^-ei^Y^ <x 
^ ne^o c^c^o, So -far- ^V^^ Wcx<? -So-o^d 

5V-\-e_ LoosS •\o t?cOo>^s\-V -for, 
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■3bV^^r^ c>.v^ <^ ^ d-e^N d v^o■V 'Vo oo<"V 



Voo.^ VV>-e_ -sVorrv-s ooo.'b . t>c 
^K^ rcx^vo Wicx«-> VV>o.\ ~.V 

^^'S.V cx-f-V^x- ^^Vtn Wrv«_^ 

^^^^^ rcxoQ ao^^c^v^ . ^2>^^v^ ^oo>K v^^r- 



Ho^r^^^^ , X>^ o.<-rc^. <^ ^oo.r 
^v.^-ro^ ^V.^<^ ^ o^eV «bc.^^ ^^^^ 

o^^w . x-o.e^so r 

cx>n^ov^^ w^o^v^^ oVVn^t^ Vv^vv~sas o^o«- 
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X ^ uOoo^Vd V^V^e -\-o H-cxXi^ Vo 

a-^^^ Wixv-vQ VAO ,"11 

3:- c^^Od«_^ ^cx^ -v4^ 
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Performance Assessment 
An International Experiment 



While the extensive use of paper and pencil tests in the main International Assessment 
of Educational Progress (lAEP) assessments made it possible to achieve good coverage of the 
knowledge and skills that could be assessed with such instruments, experience in the United 
Eongdom had demonstrated that some types of performance assessment were feasible in 
national surveys of student attainment. Given this experience and the desirabiUty of extending 
the curriculum coverage in lAEP, a limited, optional component of performance assessment 
was included in the 1991 surv^ey. The assessment was developed for 13~year-olds only and 
included mathematics and science tasks to enable lAEP participants to experiment with perfor- 
mance assessment in an international context. The U.S. did not participate. 

The pages reproduced here show the science tasks used, along with the performance of 
the participating countries. 



Performance Assessment: An International Experiment yf^iS written by Brian McLean Semple, 
The Scottish Office, Education Department, and published by the International Assessment of 
Educational Progress, Educational Testing Ser-ce, Report No. 22-CAEP~06, July 1992. 

Copies of the report can be ordered from: 

Center for the Assessment of Educational Progress 
Educational Testing Service 
Rosedale Road 
Princeton, NJ 08541-0001 



The Center for the Assessment of Educational Progress (CAEP) is a division of Educational 
Testing Service devoted to innovative approaches to the measurement and evaluation of educa- 
tional progress. The present core activity of CAEP is the administration of the National Assess- 
ment of Educational Progress (NAEP), imder contract from the U.S. Department of Educa- 
tion. CAEP also carries out related activities, including the International Assessment of Educa- 
tional Progress (lAEP), state assessments, and special studies. 
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LIGHT-UP 



Task Descriptor 

To categorize objects according to their electrical conductivity by completing an electrical 
circuit; to explain why some objects enable a bulb to light; to predict whether an object in a 
sealed container would enable the bulb to light and to explain why (or why not). 

Equipment/Material 

An electrical circuit with a bulb and a 
gap with two contacts which could be 
bridged. Five objects as follows: 
wood strip, plastic strip, nail, foil 
strip and cardboard strip. Also an 
object (piece of copper wire) in a 
sealed, clear plastic box. 



Student Instructions 

Complete the circuit using the five objects in turn. List those objects that enable the bulb to 
light and explain why. Say whether you think object X would enable the bulb to light and 
explain why. 

Scoring Scheme 

Credit was given for identifying the nail and foil strip as conductors and for giving an 
explanation mentioning one of the following or its equivalent: objects conduc't electricity, allow 
electricitycharge to pass, complete the circuit, are metal. Also credit was given for savins 
object X would enable the bulb to light and for giving an explanation as above. 

Problems 

There was a problem in some Canadian provinces where the word "enable" in the instructions 
was read as "unable." Students who listed the nonconductors and provided an appropriate 
explanation were counted as giving the correct answers. 

Comments 

Most students, 78 to 93 percent, categorized the objects correctly, but somewhat fewer 
were able to give a valid explanation for what they had done. 

• In four of the countries and provinces, more students recognized the conductivity of object 
X than had categorized the original objects correctly and in two of these countries and 
provinces (and three in total), more students ga\-e a valid explanation for their decision. 




Percentage of Correct Responses (with Standard Errors) 



Aibcrta 
Britisti Columbia 
Nova Scotia 
Ontario-English 
Ontario-Frtncti 
SasJtatcitewan-English 
Saskatchtwan-Frtnch 
England 
Scotia nil 
Soviet Union 
Taiwan 



British Columbia 
Nova Scotia 
Ontario-English 
Ontario-French 
Saskatchewan-English 
Saskatchewan -French 
England 
Scotland 
Soviet Union 
Taiwan 



>78 (4 3) 



-0 76 (2 9) 



-•85 (2 A) 



-078 (2.5) 



192 (2.0) 



-083 (3.5)- 



185 (2.4) 



-O 77 (3.5) 



-•88 (2.2) 



-O 75 (2.7) 



185(1.9) 



-O80 (2.0) 



-•93(0.0) 



-0 75 (8.0) 



-085(0.0) 
188 (2.7) 



-•86(2.6) 



=080(3.7) 



>85 (2.9) 



-O80 (2.6) 



-•89 (2 4) 



-083(2.7* 



10 



20 



30 



40 



50 



60 



70 



80 



93 100 



) Identified nail arMi foil strip 



) Provided valid explanation 



191 {2 5} 



-072 (1 6) 



189(2 5) 



-C35,2 3> 
— •86('. 8) 



77.3.5: 



>8* - 4. 



-::-5 '2 01 
^82 f2 4> 



-O 70 (2.5) 



-•83l24t 



)70t3 0j 



-•80 (OOt 



-0 77(0 01 



— mse (£' 

~082(6 CI 



■035.2 8) 



-•88('. 7\ 



76(1 9i 



-•87 .-2 5. 



-C69(3 4, 



10 20 30 40 SO 60 70 80 90 100 
• Identified X as conductor O Provided valid explanation 
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CIRCUIT 



Task Descriptor 



To construct an electrical circuit as represented in a drawing by selecting appropriate 
components and connecting them correctly. 

Equipment/Material 

Drawing of the circuit and a set of components as listed below. (Number of components 
required to construct the circuit are shown in parentheses.) 

3 batteries (2) 
2 battery holders (2) 

5 bulbs (2) 
2 bulb holders (2) 
1 switch (1) 

6 wires with clips (5) 






Student Instructions 

Use ihe oojecis on r.ne care maKe -jp the circuit shown m the drawing. Yo- niav not have to 
use ail of the equipmeni. When \our circuit matches the diagram. cio!;,e the switch anc see 
what happens. Raise your r.a.nc and r.sk the administrator to check your work. 

Scoring Scheme 



Credit was eiven for tne 



•ect no?;itioning ot batteries and bulbs, anc for usms five wires to 
form a closed loop, i.nus enaoime the ruibs to iisht. 



Problems 

A loose connection m a nuib r.oicer m i^ne of the kits used in Ontario preventec the two bulbs 
::om iightmg, put siuuent!^ ue.-e crecitec for constructing the circuit CvXrectly. 

Comments 

Almost all students across participating countries and provinces completed this task 
successfullv. 
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iSIG8PY MAILABLE 



Percentage of Correct Responses (with Standard Errors) 



Alberta 
British Columbia 
Nova Scotia 
Ontario-English 
Ontario-French 
Saskatchewan-English 
Saskatchewan-French 
England 
Scotland 
Soviet Union 
Taiwan 



10 



20 



30 



40 



50 



60 



70 



30 



191(2.5) • 
-•93 (2.0) 
#97(1.3) 



>91 (1.8) 
—•95(1.5) 
•99 (0.6) 



-• 95 (0.0) 
—•97(1.3) 



-•98 (0.7) 



191 (3.0) 
-•93(1.4) 



90 



100 



» Batteries, buibs. and wires in correct position 
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FILTER 
Task Descriptor 

To set up apparatus for filtering, as shown in a drawing, and to filter some muddy water, 
Equipment/Material 

A ring stand, a funnel, a beaker, and a folded filter paper. Also, a bottle of muddy water. 




Student Instructions 

Set up the apparaius 2^ s.hcwn 'n the drawing above, pu: :he folded filter paper into the funnel, 
and pour a small amount of muddy water inio the funnel. Raise your hand when you have 
gotten some clear water and ask the administrator to check your work. 

Scoring Scheme 

Credit was given for the apparaius being assembled cor:ec:i>, :he filter paper bems msertea 
correctly in the funnel, and for any clean water obtained. 

Problems 

In the pilot-tesimg, ::',:er papers were supplied unfolded and this caused widespread problems, 
but in the final assessment they were p»e-folded. 

Comments 

• There was a high success r.ite. S6 to 100 percent correct, in assembling the apparatus: but 
more difficulty wr.s experienced wi:h correctly in«^er:ing the filter paper, where success 
ranged from n3 :o -^9 percent correct. 

Despite problems \Mth :,tc filter paper, many students were <tiil able to obtain some clean 
water. 
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Percentage of Correct Responses (v ith Standard Errors) 



Alberta 
British Columbia 
Nova Scotia 
Ontario-English 
Ontario-French 
Saskatchewan-English 
Saskatchewan-French 
England 
Scotland 
Soviet Union 



Taiwan 



—018(0.0) 



-063 (7.2) 



>95 (2.2) 



-078(4.6) 



-083(4.0) 
—084(4.1) 



-•95(1.4) 



O 70 (3.8) 
<5)80(3.1) 



-074 (4.0) 
081 (3.3) 



-•97(1.5) 



194(1.5) 



-C70 (3.4) 



-•97(1.5) 



-—©86(2.1) 



— 035 (3.1) 



-^99 (0.5) 



-081 (0.0) 



-090(2.7) 

-•99(0.0) 



-083 (5.1) 



> 96 (3.2) 



-083 (5.2) 



-089 (3.1) 



MOO (0.3) 



)75 (2.6) 



-<H) 94 (2.4) 
-•86 (1.9) 



-080 (2.3) 



-067 (3.7) 



^95(1 S) 



-^68(3.7) 



« ' \ \ ] ' '> ^ 1 

2° 30 4G 50 60 7'0 80 90 100 



> Apparatus correct 



) Filter paper correct © clean water obtained 
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MAGNET 



Task Descriptor 

To use a magnet to identify magnetic and non-magnetic items and then to explain the 
difference between them. 

Equipment/Material 

A magnet and the follo%ving seven objects: plastic button, iron or steel washer, steel paper clip, 
iron nail, glass marble, plastic rod and copper coin. 




Student Instructions 

Test the objects with the magnet and divide them into two groups. List the objects in the two 
groups and explain what makes the objects in the two groups different. 

Scoring Scheme 

Credit was given for grouping the objects correc:iy. Four ca* .gories of explanations were 
recorded: namely. :hat one group was made of iron or steel. :hat one group was attracted by :he 
magnet, that one group was made of iron and steel and was attracted by the magnet, and anv 
other explanation. 

Comments 

Generally students performed the categorization task well, scores ranging from 86 to 95 
percent correct: but 10 percent of the Mudents across all countries and provinces gave 
irrelevant explanations. 

Omission rates were generally low, but there was a 6 percent omission rate in England. 

The most frequeni explanation for students' categorization was that one group of objects 
was attracted by the magnet: 79 percent of the students across participating countries and 
provinces gave this response. Fewer, between 4 and 30 percent, mentioned iron or steel, 
and this varied considerably among countries and provinces. 
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Percentage of Correct Responses (with Standard Errors) 



Albtrta 
British Columbia 
Non Scotia 
Gntano>Engllsh 
Ontario-Frtncti 
Saskatcktwan-Engiisli 



Saskatciimmi-Fnncli 



EnglMi 
ScoUaMi 



Sovitt Union 
Taiwan 



>90 (2.3) 
-•93(1 2) 
•95 (1.6) 



193(1.5) 



-•88(2.2) 
•94(1.8) 



-•86(0.0) 
•93(2.7) 



> 94(1.9) 



-•90(2.1) 



-•87(2.3) 



1 , ■ ■ I ■ ■ ■ 1 - ■ ■ ■ I ' ■ ■ I ' T ■ ' ' 1 ' ' ' ' I 

0 10 20 30 40 5Q 60 70 $0 90 100 

• Nail, wasticr, and paper clip categorized in one group 

Percentage of Students Giving Particular Explanations (with Standard Errors) 



Albarta 



British Columbia 



Nova Scotia 



Ontario-English 



^4(1.5) 
2 (0.9) 



—04(1.3) 
• 6(1.8) 



2> 1 (0.6) 
2 (0.9) 



Ontario-French 



-©2(1.1) 
—•8(1.8) 



Saskatchewan-English 



Saskatchewan-French 



England 



Scotianii 



Soviet Uni»n 



Taiwan 



^ 14 (3.4) 



-071 (4.8) 



HO 84 (3.9) 



OSS (2.7) 



-081 (2.5) 



: 62 (3 5) 



>7(18) 



M7(3.9) 



5(1 4» 

•13(0.0) 



-069 f4 6) 



-0 10 (0.0) 



-054 tO.O) 



-•9(4.1) 



-©2(1.7) 
* 3 (1.01 



-O 74 (8.0) 



2)1(0.7) 

•9(2.6) 



-083 (2 1) 



-®9(2ft) 
— •n (1.5) 



-O 74 (4 2) 



-061 (2 2: 



-019(3 1^ 



n \ • — ^ ^ 1 

10 20 30 40 50 60 70 80 90 100 



— • Iron or stael — O Attracted by magnet — © Iron or «teel and attracted by magnet 
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INDICATORS 



Task Descriptor 

To determine whether three solutions contain glucose, starch, or glucose and starch using 
indicators for glucose (test strip) and starch (iodine solution), 

Equipment/Material 

Three dishes labelled A. B and C containing the standardized, unknown solutions. Glucose test 
strips and iodine solution in a dropper bottle. 




Student Instructions 

The glucose test strip will turn from yellow :o green on contact with a solution containing 
glucose and the iodine solution will turn blue-black when starch is present. The dishes A. B 
and C contain three different solutions which you are to test for glucose and starch using :he 
indicators. Take the dish filled with solution A and aip the glucose test strip into it, Lt: :he 
test strip dry. Add a drop of iodine solution :o dish A. Obser\'e all :he resuhs. repon wha: 
solution A contains and repeat for solutions B and C. 

Scoring Scheme 

Credit was given for identifying glucose op.h ::i bv:iu:ion A. starch only in solution 3. 2nc 
glucose and starch in solution C. 

Comments 

The differences in performance among countries and provinces were substantial in all three 
tasks. For each task, the difference in the scores of the highest and lowest pertcrmino 
populations was at least 20 points. 

Success rates in identifying the solution containing only glucose were highest, averacins 63 
percent correct across participating countries and provinces. Those for the starch-only " 
solution and the mixture of both averaged 53 .ma percents correct, resocctiveiv. 
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Percentage of Correct Responses (with Standard Errors) 



Alberta 



British Coiumbia 



Nova Scotia 



Ontario-English 



Ontario-French 



Saskatchewan-English 



Saskatchewan-French 



England 



Scotland 



Soviet Union 



Taiwan 



> 58 {3.9) 



■051 (.2.6^ 



-®49 (4.0) 



-#61 (4 5) 



)53 (6 2) 



-037 (4.9) 



-#67 (3.0) 



-048 (4.1) 



-041 (4.0) 



-#60 (3 1) 



053 (4.1) 



-042 (4.1) 



-#58 (4.3) 



-041 (2.8) 



-H*) 36 (4.1) 



► 73 (2.7) 



-040(2.0) 



-061 (2.9) 



HI 71 (0 0) 



-054(0 0) 



-0 47 (0.0) 



-#78 (6 71 



— O 56 (5.1) 
-054(7.4} 



) 79 (3 0) 



■K?) 53 (3 2) 



)68 (2.7) 



♦ 76 (1.5» 
i3.2) 



-061 (3 1^ 



-#72 \3 0) 

-?71 (2 V, 



-059 (3 0) 



10 20 30 40 50 60 70 80 90 100 



-# Solution A 



-O Solution B 0 Solution C 
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FLOAT 



Task Descriptor 

To select correct observations about flotation from two sets of objects. 
Equipment/Material 

Two small glass jars labelled X and Y containing clear liquids and identical plastic tovs 
floating (in jar X) and one submerged (in jar Y). ' ' 



one 



Student Instructions 

Look carefully at the two jars you may pick them up. Five other students looked at these 
jars and made tne following statements. Which statements are observations that is thev 
describe what the student actually saw? 

A. I see a toy floating in jar X. 

B. I see a toy floating in jar Y. 

C. I see a toy in jar X that is made of a different plastic than the tov in jar Y. 

D. I see jars containing colourless liquids and coloured toys. 
. toy in jar Y that is heavier than the toy m jar X. 



h. i se 



Scoring Scheme 

Credit was given for circling correct statements A and D and not circiina incorrect statements 
13. L ana E. 

Comments 



Ihe percentages of students who circled both correct statements and none of ths 
ones were iow. ranging from 10 to 34 oercent. 



incorrect 



Most stuaents recognized statements A and D as obser^-ations. Almost all students 
recognized that statement B. the opposite of statement A. is not an observation Most 
students recognized statement C. that the two tovs were made of different plastic' is 
an incorrect statement, probably because the toys looked so similar. However, statement E 
that the mass ot the toys were different, proved attractive to manv students and thev ci'cled 
It. even thougn they had no %vay of knowins the mass of the t%vo'tovs 
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Percentage of Correct Responses (with Standard Errors) 



Alberta 
British Columbia 
Nova Scotia 
Oatario-Eggli'sh 
Ontario-Fffneb 
Satfcatchtmn-Eflgiisb 
Saskatchtwan-Frencb 



England 
Scotland 
Sovif t Union 
Taiwan 



-#24(5 0) 
—•25 (3.0) 



-••"J5 (2.8) 
-•25 (3.7) 



121 (2.3) 



-•20(0.0) 



-•20(4.1) 



-•22(2.9) 



MO (2.3) 



134(3.0) 



> Statcnwnts A and D circ»e<S, statements B. C, and E 



not circled. 



Percentage of Students Recognizing Incorrect Statements (with 



Alberta 

British Columbia 
Nova Scotia 
Ontario-English 
Ontario-French 
Saskatchewan-English 
Saskatchewan-French 
England 
Scotland 
Soviet Union 



Taiwan 



Standard Errors) 



191 (2.1) 



-^6^ (6 7^ 



-091 (2 5) 



192(1 71 



—055 (4 01 



-OS8(2 7) 



^91 (1 5i 

~085 (2 4) 



-345 i2 4:- 



•88 12 n 

-085(1 6) 



52 <4 6 



• 85(3 2> 
-D 86 .-2.51 



38 f2 9» 



— O 79 12 7) 



539 <0 0j 



^88 (CO) 

-084 fO 01 



-038 (5 8) 



74 (3 2) 



-'^34 <3 0» 



-083 f2.6) 



196 (3 2) 



194 i\ 3) 



35 (4 21 



^82 (4 0} 

-075(3 7) 



-3 51 i2 - 



-OT'S (2 9> 



>96f1 2) 



10 20 30 40 50 
• Statement B 



50 70 80 90 100 
Statement C © Statement E 
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TABLET IN WATER 
Task Descriptor 

To observe and record all the changes which take place when a tablet dissolves in water. 
Equipment/Material 

Water supply, plastic cup, and fruit-flavoured, coloured fizzy tablets. 



Student Instructions 

Observe what happens when the tabiei is in the water. Write as many different things as you 
notice. 

Scoring Scheme 

Credi: given for all appropriate visual. :vjdi:ory, and oitaciory changes recorded. 
Comments 

• The changes that were recorded by mos: *^:udents were in the size of the tablet, the colour 
of the water, and :he bubbling oi gas. These are all visual changes and i: may be :ha: :he 
use of the word "obseire** in the students" instructions biassed their responses towards such 
changes. However, there were substantial differences in the reponing of different visual 
changes and among different countries and provinces. 

• .A notable feature was a wide range in the reporting of the fizzing sound as the tablet 
dissolved, from 3 percent in Taiwan !0 50 percent in Nova Scotia. 

• .At least one-half of the students in participating countries and provinces mentioned four or 
more observations, except in the Soviet Union and Taiwan, where the percentages were 45 
percent and 34 percent, respective!} . 





Percentage of Students Mentioning Correct Observations (with Standard Errors) 



Alberta 
British Columbia 
Neva Scotia 
Ontario-English 
Ontario-French 
Saskatchewan-English 
Sasfcatcbewan-French 
England 
Scotland 
Soviet Union 
Taiwan 



-•37 (7 1) 



-061 (6.7) 



-•24 (3.1) 



M9(2.9) 



-074 (3.5) 
081 (3.0) 



> 35 (3.1) 



-O 65(3.1) 



-•47(3.0) 

050(3.0) 



M8(3.4) 



-0 82 (3.5) 



»31 (0.0) 



-•37 (6.0) 



068(0.0) 

-062 (6.8) 



M2 (4.5) 



-0 57 (4.6) 



•52(3.7) 

-045 (4.1) 



-034(3.5i 



158(3.5) 



10 20 30 40 50 60 70 



90 IOC 



»2to3 



> Four or more 



Percentage of Students Mentioning Specific Observations (with Standard Errors) 



Alberta 



British Coluinbia 



Nova Scotia 



Ontario-English 



Ontario-French 



Saskatchewan-English 



Saskatchewan-French 



England 



Scotland 



Soviet Union 



— H0 13 (2.8) 



-036(4 81 



-014 (2 2) 



-036 (3 3j 



-028 (2.5) 



^50f3 0) 



-019(2.3) 



-036 (3.0) 



-Ol5(2 3) 
©24 (2 4) 



-C37 (4 2) 



-0 33 (2 7) 



-023 (0.0) 



-046 (0 0) 



-0 14 (3.4) 



-h2)38{S 6) 



-018(2.3) 



-0 35 (4 0) 



-034 (2 2) 



-029(3 1) 



Taiwan -03 (i.i) 



-0 24 (2.8i 



-•86 (3.6) 



-•90 (2 41 



-•89(1 6) 



188 (2 7) 



i 79 (2 7j 



189(1 9) 



-•75 (0 Oi 



-•92 (2 8) 



189(1 8) 



-•83 (2 4i 



-•76 (2 6) 



I . ^ ' : ' ■ \ > — — I 1 

10 20 30 40 SO 60 70 10 90 100 



> Colour 



) Sound 



SEEDS 
Task Descriptor 

To categorize rwo different types of seeds according lo their size, shape and colour. 
Eqi pment/Material 

Three groups of seeds labelled 1, 2, and 3 and two containers labelled X and Y with the 
"unknown" seeds. 




Student Instructions 

Your task is to decide in which group seeds X ana V belong and to state your reasons. Look 
carefully at the three groups of seeds 1. 2. and 5 - you may pick up the containers. 

Scoring Scheme 

Credi: was gnen for relating seeds X to group 3 nnd mentioning . and size. Also tor 
relating seeds Y :o group 2 and mentioning s.nape. 

Problems 

Seeds could not be included in the kits and Nometimes .: p:?'. ed :rr.possible to obtair. 
comparable seeds in all of the countries involved. Because the colour of the sesame seeds 
varied (sometimes white, sometimes yellow), comparable scores could not be obtained icv the 
first part of the task, categorizmg seed X. 

Comments 

In general, high proportions of the students were able to assign ^eeds Y to the correct 
group, ranging from 70 percent in the Soviet Union to 92 percent ir. English-speaking 
Saskatchewan. 

• Many fewer students provided the correct rer>.v. ror :ne!i categorization. The percentages 
doing so ranged from 18 to 50 percent. 
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Percentage of Correct Responses for Container Y (with Standard Errors) 
Alberta 



British Columbia 
Nova Scotia 
Oritars-Inglisb 
Ontario-rrench 
Saskatchewan-Englisb 
Saskatchewan-French 
England 
Scotia-^ 
Soviet Union 
Taiwan 



024 (2.8) 



-050(4.4) 



O 34 (4.2) 



-031 (2.3) 



022 (2.7) 



-050(6.5) 



031 (0.0) 



-O 39 (4.4) 



O 31 (3.5) 



-•78 (6.2) 



-•SI (1.5) 



-•83(3.1) 
•90(1.9) 



-•89(2.3) 
^92 (2.2) 



•86 (0.0) 

-•82 (4.4) 



173 (3.2) 



0 18 (3.2) 



► 70 (5.6) 



-047 (3.5) 



182 (2.6) 



10 20 30 40 5C 



-• Group 2 



60 70 80 90 100 
Mentioned shape (flat and oval) 
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^^Measuring What's Worth Learning'^ 

and 

"Mystery Graphs" 



Measuring Up: Prototypes for Mathematics Assessment 

At the National Summit on Mathematics Assessment held at the National Academy of 
Sciences in 1991, Governor Roy Romer challenged the mathematics commimity to show 
through realistic examples just what we mean by "standards-based" education. Measuring Up 
contains 13 assessment prototypes that exemplify changes called for by the National Council of 
Teachers of Mathematics (NCTM) Curriculum and Evaluation Standards for School Mathemat- 
ics. Two sections are reproduced here. 



/ 

Copies of Measuring Up: Prototypes for Mathematics Assessment^ Mathematical Sciences 
Education Board, National Research CouncU, Washington, DC: National Academy Press, 
1993, may be purchased from: 

National Academy Press 
2101 Constitution Avenue, NW 
Washington, DC 20418 
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Adapted with permission from Measuring Up: Prototypes for Mathematics Assessment. 
Copyright 1993 by the National Academy of Sciences. Courtesy of the National Academy Press, 
Washington, DC. 
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Measuring What's Worth 
Learning 




The spotlight of educa- 
tional reform continues to 
sweep across the stage of 
mathematics. First curricu- 
lum, then teaching, and 

now assessment have come 

under intense professional 

and public scrutiny. Amid deteriorating public confidence in 
the quality of American education, the mathematical commu- 
nity is addressing multiple challenges to articulate and imple- 
ment effective standards in the key arena of testing, assessment, 
and accountability. 

in the center of the assessment stage are three elements con- 
testing for leadership. Conventional testing offers comfortable 
short-response tests on traditional content that are taken by mil- 
lions of students every year. Reformers, including authors of the 
two K-12 Standards documents from the National Council of 
Teachers of Mathematics (NCTM), call for fundamental change 
— different in content, in format, and particularly in spirit. To 
this well-rehearsed contest of traditionalist vs, reformist has now 
been added a third movement arriving from outside the educa- 
tional community: the political call for assessment of progress 
towards our nation's new standards in mathematics education. 

In the decade since publication of A Nation at Risk, the 
United States has moved a long way toward a new consensus 
for education. Talk of national standards, once taboo, is now 
commonplace; so too is talk of alternative school structures 
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and innovative licensure for teachers. It is nov^ time to develop 
a nev^ national understanding of standards-based performance 
as the true measure of educational progress. 

Throughout this decade, mathematics has led the way in 
educational reform. The 1989 MSEB publication Everybody 
Counts was followed in just two months by publication of the 
NCTM Curriculum and Evaluation Standards for School 
Mathematics, with its theme of developing mathematical 
power in all students. Undergirding these reports are three fun- 
damental principles of testing, assessment, and accountability: 

• Tests should measure what's worth learning, not just 
whafs easy to measure. 

• Progress depends on constant correction based on 
feedback from assessmient. 

• Schools are accountable, both to taxpayers and to students. 

Even as the renewed public scrutiny compels educators to 
demonstrate that children are learning, the NCTM's Standards 
require new ways of measuring what is being learned. Because 
the linkage between tests and teaching is so close, it is v:\aily 
important for the United States that assessment be based on instru- 
ments that are properly aligned with the goals of the Standards. 

The Challenge 

At the National Summit on Mathematics Assessment in 
April 1 991 , Governor Roy Romer, in his capacity as Co-chair of 
the National Education Goals Panel, challenged the mathemat- 
ical community to show the nation what mathematics educa- 
tors mean by mathematical power and what new and more 
demanding standards will mean for our young people. One 
month later, the MSEB authorized creation of prototypes of 

tasks that could be used to assess fourth- 
Why we are doing this graders' mathematical skills and knowledge, 

thereby providing examples of what children 
educated according to the new standards should be able to do. 
They wanted to be sure that the voice of mathematics was 
heard early and clearly in the assessment reform movement. 
The MSEB determined that it should be prepared to show, by 
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Prototypes for Mathematics Assessment 

example, the type of assessment exercises that would be appro- 
priate to measure our nation's progress toward the goals of 
mathematics education. 

To create the prototypes, the A V EB subsequently -onvened 
a small writing group of mathematics, educators, teachers, and 
mathematicians. Taking up Governor Romer's challenge, the 
writing group created a sampler of tasks to encompass many of 
the goals for mathematics instruction that are expressed in the 
NCTM Standards. These tasks, which illustrate what a stan- 
dards-based education really means, have been pilot tested on 
a limited basis in four states. Many have been revised, often 
more than once, but all can benefit from continued improve- 
ment and adaptations. 

Readers who skip ahead will see that these prototypes are 
not only innovative and challenging but also just plain fun. 
Teachers, children, and even parents should find these tasks 
both engaging and surprising. We invite readers to try them, 
either before or after reading the surrounding analysis. 
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The Criteria 

Not surprisingly, the MSEB writing group debated exten- 
sively the criteria for prototypical assessment tasks. They faced 
the pioneer's challenge — to use incom- 
plete information as a basis for decisions What We are trying tO do 
whose consequences are difficult to fore- 



see. 



From these discussions emerged several criteria that 
helped shape the nature and selection of prototypes in this vol- 



ume: 



Mathematical content: The tasks should reflect the 
"spirit" of the reform movement, but not necessarily be 
limited by particular curricuiar content, present or 
planned. Many of the tasks should incorporate a vari- 
ety of mathematics, particularly in areas such as statis- 
tics, geometry, and probability that are least often 
emphasized in traditional K-4 programs. 

Mathematical connections: Everyone involved in the 
mathematics reform movement, from classroom teach- 
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ers to national policy makers, agrees on the importance 
of connecting mathematics — to science, to social sci- 
ence, to art, to everyday life, and to other parts of math- 
ematics. Accordingly, the prototypes should develop 
links with science, with the visual arts, and with the 
language arts, 

• Thoughtful approaches: Insofar as possible, the tasks 
should promote ''higher-order" thinking. Just as the 
verbs explore, justify, represent, solve, construct, dis- 
cuss, use, investigate, describe, develop, and predict 
are used in the Standards to convey "active physical 
and mental involvement of children" in learning mathe- 
matics, they are appropriate to seek in assessment 
activities as well. Further, given a choice between cog- 
nitive complexity and simplicity, the focus of these 
tasks should be on the former. 

• Mathematical communication: The tasks should 
emphasize the importance of communicating results — 
not simply isolated answers, but mathematical represen- 
tations and chains of reasoning. Children should have 
opportunities to work in groups to explain their thinking 
to others, and to write explanations of their approaches. 

• Rich opportunities: The tasks should let children solve 
problems via a variety of creative strategies; demon- 
strate talents (artistic, spatial, verbal) beyond those nor- 
mally associated with numerical mathematics; invent 
mathematics that (to them) is new; recognize opportu- 
nities to use and apply mathematics; and show what 
they can do (as opposed to what they cannot do). 

• improved instruction: The tasks should have the poten- 
tial for influencing instruction positively, both in con- 
tent and in pedagogy. Teachers who use these tasks 
should become better teachers as a result of the experi- 
ence; children who participate should emerge with 
increased self-confidence and heightened expectations 
for future mathematics courses. 
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The Caveats 

These tasks are prototypes, not tasks ready for immediate 
administration to fourth-grade students. They are intended to 
illustrate possible directions for new assessment instruments, 
not to be an example of a real assessment. Certainly they 
should be viewed as work in progress, not as fully completed 
blueprints. 

Criteria related to cost, efficiency, and immediate feasibility 
were deliberately not imposed on the work of the writing 
group. These are important considerations for implementation, 
but not for this volume. The MSEB goal for Measuring Up is to 
promote long-term change, not to write assessment material for 
current courses. 

As assessment instruments, these prototypes are intended 
for children who have had the full benefit of a Standards-cal- 
iber mathematical education in kindergarten through fourth 
grade. Hence the tasks as presented here will be more appro- 
priate, generally speaking, for students of some time in the 
future. From the perspective that has 

historically dominated U.S. testing. What We are nOt trying tO do 
these prototypes illustrate directions 

for tomorrow, rather than tasks for immediate practical use. 
From a perspective more common in Europe — where tests, 
appropriately publicized in advance, set targets for teaching 
and learning — these prototypes do serve the immediate pur- 
pose of defining appropriate goals for fourth-grade instruction. 

Moreover, the prototypes, as a set, are not intended to illus- 
trate a single assessment that treats all of the mathematics 
important at the fourth-grade level. Much that is important in 
the curriculum is not covered adequately in the particular 
examples chosen for this volume. Nevertheless, to expand our 
view of appropriate mathematics goals for the primary grades, 
these tasks provide more opportunities for children to demon- 
strate their ideas in areas often missing from the curriculum 
(e.g., data, geometry) than in areas already well entrenched 
(arithmetic). The imbalance in these examples reflects our 
desire to illustrate the new, not an effort to reshape the curricu- 
lum to fit this particular set of examples. 
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Measuring Up 

These prototypes, which are tasks to be done in time spans 
ranging from one to three class periods, represent only one of 
many important forms of assessment. Other forms of assess- 
ment are esseritial for a balanced program, including projects 
(extended pieces of mathematical investigation designed to 
take a substantial block of time), portfolios (structured collec- 
tions of student work gathered over a long time period), and 
tests (time-limited responses to shorter tasks). Some of the ref- 
erences at the end of this volume (e.g., Pandey [1991]; 
Stenmark (1 989]) describe these alternative approaches. 
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The Audience 

Many readers of Measuring Up will be persons who are pro- 
fessionally concerned with mathematics education, particularly 
developers of tests and other assessment instruments. For such 

. I people, both those who work within 

Whom we are trying to reach , commercial test development com- 

■ panies as well as those in education- 
al settings at the state or local levels, Measuririg Up should stim- 
ulate development of new approaches to assessment that reflect 
the broad goals of the nation's standards for mathematics edu- 
cation. 

if mandated assessments evolve to resemble more closely 
the ones suggested in this book, it is clear that different 
approaches to instruction and testing will be needed. Hence 
school administrators and educational policy makers will also 
be affected by the changes implicit in these prototypes. The 
tasks will convey to the audience of policy makers and educa- 
tion leaders what mathematics educators mean by assessment 
reform. 

A third audience for Measuring Up consists of classroom 
teachers, and not just those at the fourth-grade level. It is only 
natural that many practicing elementary school teachers may find 
some of these tasks to be somewhat daunting, especially if their 
students have not had the mathematical preparation that the tasks 
assume. Teachers should look at the prototypes not as current 
expectations, but rather as goals to aim for. The prototypes can 
be viewed both as examples of what tomorrow's assessment in 
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mathematics might be like, and as examples of what toda/s goals 
for instruction should be like. In the meantime, teachers can use 
them as ideas for instructional activities for today. (A list of 
resources for teachers including the names and addresses of con- 
tacts in each state appears at the end of the volume.) 

Another audience is the community of university-based edu- 
cators who are responsible for the pre-service education of 
prospective teachers. They will find Measuring Up to be a 
source of ideas to use today for connecting the tenets of the 
mathematics education reform movement to classroom practice. 

Finally, of course, the ultimate audience for these new 
assessment tasks and the ideas that underlie them is the ele- 
mentary school children for whom the tasks were designed. 
The tasks provide good examples of challenging mathematical 
problems and situations that effective teachers can use even 
now as part of their instructional strategies. Today's children 
can begin to see the challenge in authentic mathematical prob- 
lems even before tomorrow's tests give them an opportunity to 
demonstrate their accomplishments. 



The Prototypes 

Measuring Up contains thirteen assessment prototypes that 
exemplify changes called for in the Standards. In some cases 
the particular settings or contexts for the tasks are original, 
while in other cases some aspect of the task has appeared in 
another form previously. 

The tasks in Measuring Up are intended for a largely hypo- 
thetical audience: fourth-grade children who have had a K-4 
mathematics experience fully consonant with the NCTM 
Standards. Unfortunately, very few U.S. fourth graders have 
had the benefit of such an educa- 
tion. This is, of course, the whole What we have accomplished 
point of the reform effort. One ' 

would not expect many of today's fourth graders to do very 
well on these tasks. Nonetheless the aim was to keep the tasks 
accessible to most of today's fourth graders; they should at least 
be able to understand what the tasks are about and become 
engaged in working on them. 
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Too often test questions and assessnnent tasks are presented 
solely in written form, which may be a burden for poor readers 
and for children whose first language is not English. Such chil- 
dren might not be able to respond to the tasks in a way that 
shows their true level of mathematical knowledge or skills. 
Many alternative presentations are possible: videotaped intro- 
duction; teacher-taught introduction; computer-based presenta- 
tion; and presentation using manipulative materials. The proto- 
types illustrate each of these alternative modes of presentation, 
and two of the tasks are written in Spanish as well as in 
English. 

Notwithstanding the possible variety in presentation, the 
prototypes in Measuring Up adhere to a certain uniformity of 
structure. Most are organized as a sequence of questions, often 
of increasing difficulty. On the one hand, this provides a struc- 
ture around which the child's problem solving must be orga- 
nized. On the other hand, this sequence of questions may sug- 
gest that the problem-poser, rather than the problem-solver, is in 



grow in difficulty, many of the tasks involve problem solving, 
reasoning, and communication right from the beginning. These 
are important aspects of mathematics for all children. 

lust as the tasks are presented in several formats, so they are 
also designed to give children a chance to respond in a variety 
of modes — perhaps by constructing an object or by creating a 
pattern on a computer screen. One important response mode 




charge of the problem-solv- 
ing process. Although other 
forms of organization are 
certainly possible, these 
prototypes provide suffi- 
cient imposed structure to 
help the mathematically 
less sophisticated student 
get started and show what 
he or she can do, while 
allowing plenty of open- 
ended space at the top to 
challenge the more advanced 
student. Even though the 
questions within a task often 
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that is not specifically included in these prototypes is that of the 
child talking individually to a teacher, explaining his or her 
solutions orally rather than in written form. Pilot testing of the 
tasks has shown that children who have not had considerable 
experience in organizing their thoughts on paper find it much 
easier to tell someone else what they are doing than it is to 
record it in writing. Teachers who use tasks like the ones in this 
collection for their own informal assessment of how children 
are progressing mathematically will want to supplement written 
responses with spoken ones. In fact, asking a child to explain a 
solution in two forms — spoken and written — can help the 
child to sharpen and deepen both responses. 

These prototypes can be used either for informal class- 
room-based assessment by an individual teacher, or for more 
formal extemal assessment, although certain modifications may 
be necessary to make the tasks suitable for a given purpose. 
All of the prototypes call for responses that go well beyond 
simple numerical answers, and most require the student to 
explain underlying pattems, relationships, or reasoning. As a 
result, the same activities can be useful to an individual teacher 
as she or he tries to discern more deeply how students are pro- 
gressing mathematically, and to a district to discern the effec- 
tiveness of its instruction. 

As the NCTM Standards urge, assessment should be embed- 
ded in instruction, so that most children would not recognize 
the assessment activity as a "test." Even when certain tasks are 
used as part of a formal, external assessment, there should be 
some kind of instructional follow-up. As a routine part of class- 
room discourse, interesting problems should be revisited, 
extended, and generalized, whatever their original sources. 

Increasingly, educators are recognizing the value of having 
children work together in groups. Certainly group work exem- 
plifies the NCTM s goal of stressing mathematics as a means of 
communication. Some of the tasks in Measuring Up are 
designed to be carried out in small groups, while in other cases, 
small groups are certainly a reasonable option. Continuing 
experimentation will be required to determine how the children 
can best be grouped for assessment tasks like these, and how to 
weigh individual vs. group work in performance evaluation. 
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In several cases the problems suggested here for fourth 
grade could also be asked in the eighth or even the twelfth 
grade, although naturally the expected sophistication and com- 
pleteness of the responses would be very different. If a mathe- 
matical task is genuinely interesting and worthwhile for fourth 
graders, there is no reason why it should not be worthwhile for 
older children, or even for adults. 



The Tryouts 

Each prototype was tested on several score fourth-grade 
students in a number of different locales. These "tryouts" were 
not designed to be either representative or comprehensive, but 
to aid in improving the tasks. This they did, but they did much 

more as well. By observing how 

What we learned from children students react to the prototypes, 

we learned much about the gulf 
that separates current students from the goals of the Standards. 
We also learned that we are novices on how these new forms 
of assessment will work in the classroom. 

Three examples can illustrate the types of surprises that all 
teachers will encounter as they begin to explore and extend 
these prototypes: 

• In a few cases the tasks as originally presented seemed 
not to be sufficiently challenging. One example is the 
"Lightning" task in which a fairly large proportion of 
the students could easily handle the map-reading 
requirements. So a question dealing with locating a 
lightning bolt that is a given distance from tvv^o 
observers was added. 

• Sometimes a proposed task yielded no information of any 
interest at all. In "Bridges," there was originally a more 
open-ended question in which students were to create 
their own bridges. Nobody created anything that went 
even a little bit beyond the two-support, single-span 
examples. This may have been due to the wording of the 
question, to the backgrounds of the particular students, or 
to some otfier factor. This lack of inventiveness and per- 
severance is something worth pursuing since creativity is 
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an essential part of doing mathematics, for fourth graders 
as well as for everyone else. However, since the question 
produced virtually no information, it was dropped, 

• One whole prototype was dropped entirely. It was a 
task on what is known as "Pick's Theorem"" — which 
relates the area of a polygonal region on a geoboard to 
the number of nails on the boundary and in the interior 
of the region. The task was extremely open-ended and 
required extensive interaction between the teacher and 
individual students or small groups of students. Even if 
one assumed (as we do) that the teachers involved iri 
the assessment are uniformly well versed in the sub- 
tleties of the underlying mathematics, there seemed to 
be no way of separating the effects of the teacher from 
the progress that individual students might make on the 
task. Perhaps such a task could be viewed as an assess- 
ment of the teacher-class unit, but in any case it seemed 
to be too problematic to include in this collection. 

The Format 

Each of the thirteen tasks is presented using the same outline. 
After the title, there is a suggested time allotment, whicli can vary 
from one to three class periods. This is followed by a suggested 
student social organization, although in many cases the task does 
not depend substantively on a particular form of grouping. 

Next comes the task itself. First there is a description of 
assumed background, in most cases this refers to specific 
aspects of the children's mathematical background, assuming 
— hypothetically, of course — 

that the children have had a K~4 How we present the prototypes 
education that fully meets the 

NCTM Standards. Second, there is a section on presenting the 
task, which details exactly what the teacher (or other assessor) 
should do. Finally, there is the student assessment activity. 
Very often this involves one or more sheets of paper on which 
students record their responses. (To reproduce these pages, 
which were scaled to the 7" x 10" page of this volume, the 
copying machine should be set to magnify them appropriately,) 
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The next major section is a rationale for the mathematics 

oHpto^.th:tas.:;pet:,^^^^^^^^^^^^ 

we I as more general messages about mathematics educa ion 
that the task is intended to convey. ™ucation 

task'^h'r*'"^ "^u" of the rationale for the 

The'f St tast r '''' ^""'^^ ^^^^^ '"^^-tion 

detai hi. i . ^°"^'derations, discusses some of the 
details behmd the task - why certain questions ^vere phra ed 
as they were, or why particular nur.bers were chosen The 
second^^vanants and extensions, hints at other directions n 

tion or further assessment. These subsections are far from 
exhaustive, for otten the tasks could be starting point for 
weeks of instruction. One important message conveyed by t " 
section .sthatthese particular prototypes are'in no ry unC^ 
The next section describes a rough scoring system - what 
IS called a protorubric — for the task k rl !/ . 
ni^ori • Wide y recoe- 

n.zed that an assessment task by itself means little without an 
md.cation ot how children's responses would be scored n 
other words an important component of an assessr^en task is 
thrrfu ''-^''^^ -^-^ ^ variety oTnswe 

ex sett on't'h? t'""^-^''"" ^■---^ e 

next sectior^, the rubrics given here are necessarih tentative 

and incomplete _ whence "protorubrics." 

Finally, in some of the tasks there is a section containing 
references to relevant sources. containing 



The Protorubrics 

abouTtn"^^ " ^^'^"^^ commentarv 

about scoring based on student work, for a number of reasons 

How might fourth graders do^ d^SaitdTcorg ruS^^' ^^"^ 

have had a mathematical education that is different 
trom what is commonly available in U.S. schools today. 



ERIC 



180 



Mystery Graphs 




experience 



Suggested time allotment 

Less than one class period 

Student social organization 

Students working alone 

Task 

Assumed background: 
This task assumes that the 
children have had extensive 
experience in dealing with sets of data, and, in particular, are 
familiar with interpreting data that are represented in line plots. 

Presenting the task: ".iie 
teacher should distribute 
the student materials and 
read enough of it to be sure 
that the children under- 
stand the task. It is also 
important to stress that the 
"classroom of fourth 
graders" is some other 
classroom — noi theirs, in 
the pilot, it was necessary 
to clarify that "cavities" in 
question la refers to both 
filled and unfilled cavities. 

Student assessment activity^: 
See the following pages. 



life 




Name 



Date 



Look at the five graphs on the next pages. Each graph shows some- 
thing about a classroom of fourth graders. 

1. Which of the five graphs do you think shows: 

a. The number of cavities that the fourth graders have? 

b. The ages of the fourth graders' mothers? 

c. The heights of the fourth graders, in inches? 

d. The number of people in the fourth graders' families? 

2. Explain why you :hink the graph you picked for c is the one that 
shows the heights of fourth graders. 



3. Why do you think the other graphs dgrVt show the fourth graders' 
heights? 
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Rationale for the mathematics education community 

This task puts a premium on looking at data sets, as 
opposed to individual pieces of information. This .'s a funda- 
mental notion that should take an increasing role in the ele- 
mentary mathematics curriculum. The task also gives children 
the opportunity to relate the graphical representations to their 
own experiences as fourth graders. 

Ordinarily, of course, one would want children to have 
plenty of chances to collect, display and analyze their own 
data, as the NCTM Standards suggest. If the task is going to 
fit within a single class period, however, there is not enough 
time to create five graphs for comparison. As a result, this 
task uses data that have already been collected from some 
hypothetical fourth grade. Clearly other assessment tasks 
(like the Hog Game and Buttons tasks in this collection) must 
include the collection, display, and analysis of data. 

Task design considerations: Children seem naturally inter- 
ested in data about people, particularly people of their own 
ages; this is one reason for choosing a hypothetical fourth- 
grade class as the basis of these data. The children Vv^ill natu- 
rally bring their own experiences with heights, ages, family 
size, and dental health with them to the task. When using such 
• situations for assessment purposes, one must be careful to use 
values of the data to which all the students can relate equally 
well. There may be cultural variations in family sizes or in the 
ages of fourth-graders' mothers, for example. To take this into 
account, the ranges of Graphs 1 and 5 are large enough to 
encompass every student's own family size and mother's age. 

Questions similar to the one about heights could be asked 
about mothers' ages, family sizes, or cavities. The only reason 
such questions are not included is to save assessment time: the 
intent was to give an example of a task that could be done in 
less than one class period. 

To some extent, this is a task that measures children's prior 
knowledge about the real world — about how many inches tall 
they are, how old their mothers are, and so on. If one is con- 
cerned with children's abilities to connect mathematics with 
. their world of experience, this is a reasonable expectation. 
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The Style of drawing line plots should be the same as the 
style to which the student is accustomed. 

Ideally, the five graphs should be displayed so that the stu- 
dent can see them all at once. 

Variants and extensions: A natural instructional follow-up 
to this task is to ask the students to compile data on heights, 
cavities, etc., from their own class, to compare with the data 
given. 

Using just the data presented here, one could pose prob- 
lems like: "Suppose Graph 2 really did show heights in inches. 
Whose heights could they be?" "Suppose Graph 3 showed the 
ages of the mothers of students in some grade level in our 
school. Which grade could that be?" "What other kinds of 
data could Graph 1 be showing?" 

Protorubric 

Characteristics of the high response: 



12 



Question 2 
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Question 3 



The high response shows a full understanding of the rela- 
tionship between the graphs and the data they represent. 
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The responses for question 1 are all correct (a. 4; b. 5; c. 3; 
d. 1). Questions 2 and 3, taken together, should explain that 
Graph 3 shows a reasonable range of fourth graders' heights, 
and that ranges of data in the other graphs are not as reason- 
able. The only real alternative candidate for the heights is 
Graph 2, but that would imply that there are fourth graders 
who are six feet tall. - 

Characteristics of the medium response: 



Medium 
Question 2 



Graph 1 and Graph 4 are inter- 
changed (number of cavities and 
number of family members); or 
Graph 2 is used in place of Graph 
3 or Graph 5; or Graphs 3 and 5 
are interchanged. Nonetheless, 
graphs showing the correct general 
orders of magnitude are selected. 
Some portions of the student's justi- 
fications are reasonable. 

Characteristics of the low response: 

At most one graph is chosen 
that shows totally unrealistic data 
(e.g.. Graph 5, with a range from 
24 to 53, is selected for the num- 
ber of people in the families). 
Responses to questions 2 and 3 are 
missing or indicate that the student 
cannot interpret the graphs, or they 
do not show any reasonable sense 
of the magnitudes of more than 
one of the items. 

Reference 

An earlier version of this task was developed by TERC 
(Cambridge, MA) for Education Development Center (Newton, 
MA). 
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^^Piloting Pacesetter: 
Helping At-Risk Students Meet High Standards'' 

by Thomas W. Payzant and Dennie Palmer Wolf 
Educational Leadership^ February 1993 



PACESETTER™, a new program of the College Board, is designed to raise expectations 
and improve performance of all American students. The program will provide secondary 
school course frameworks and related assessments in five subject areas, supported by profes- 
sional development opportunities for teachers. All elements are being developed in cooperation 
with members of the leading national subject matter associations. The mathematics offering 
will be piloted in 1993-94, followed by offerings in EngUsh, world history, science, and Span- 
ish« 

The following article discusses PACESETTER™ as it is being pilot tested in the San 
Diego City Schools. 



Payzant, Thomas W. and Dennie Palmer Wolf (February 1993). "Piloting Pacesetter: Helping 
At-Risk Students Meet High Standards,'' Educational Leadership^ 50, 5:41-45. Reprinted 
with the permission of the Association for Supervision and Curriculum Development. 
Copyright © 1985 by ASCD. All Rights Reserved. Reprinted with the permission of Thomas 
W. Payzant. 
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Piloting Pacesetter: Helping At-Risk 
Students Meet High S tandards 

Thomas W, Payzant and Dennie Palmer Wolf 



The San Diego City Schools, 
in partnership with the 
College Board, are piloting 
a program that seeks to 
prepare a// students for 
the educational demands 
beyond high school. 
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Martin is 14. He reads on a 4th 
grade level. His writing is 
simple — not because he 
doesn't have complex 
thoughts — but because he 
often struggles to find the English 
word he wants, and 40 minutes simply 
isn't enough time to think, draft, and 
revise. He wants to graduate from 
high school and enter a demanding 
job-training program at a local light 
and power company. As his father 
points out, "It*s the difference between 
S6 and S20 an hour all the rest of your 
life.^' 

But the entry test is no joke. To 
pass, you need the modeling skills to 
notice patterns and predict possible 
difficulties down the line in the 
machinery. That entails working with 
Boyle's and Charles* laws and alge- 
braic equations, and diagnosing 
sources of possible error. And it 
doesn't end there. The company is 
looking for employees who are able to 
inter\'iew suppliers and examine 
product information and forms written 
in Spanish, Japanese, or German. 

Access to Hitii Outcomes 

Gone are the days when graduation 
was a matter of going to school just 
enough to earn your Carnegie credits, 
or when any high school diploma 
could act as a passport. Public high 
schools, like those in San Diego, have 
as their major imperative helping all 
students prepare for postsecondary 
education — in colleges, in public 
service, or on the job, where the ticket 
is high-level competence, not atten- 
dance. The challenge is daunting. San 
Diego is an urban district of 125,000 
students with diverse racirl, ethnic, 
linguistic, and socioeconomic back- 
grounds. Sixty different .first 
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languages are spoken: 30 percent of 
the students are Hispanic, 19 percent 
are Asian (with large Indo-Chinese 
and Filipino groups), 16 percent are 
African American, 34 percent white, 
and I percent "other." 

In this context, we have had to 
rethink traditional approaches to 
equity. We can no longer be content 
solely with the simple arithmetic of 
inputs — ^racially mixed schools, 
racially diverse teachers, classes of 
equal size, and bilingual opportunities 
for learning. We now face the chal- 
lenge of providing equity of outcomes. 
This is a tall order in American public 
schools, where there is a long-held 
behef that ability is distributed in a 
normal curve pattern and, conse- 
quently, tracking is not only conve- 
nient, but appropriate. To uproot such 
dfeep beliefs demands a program of 
serious and sustained change in atti- 
tudes, daily practices, curriculum, and 
assessment. 

In San Diego, we began five years 
ago by instituting a common core 
cunriculum. Today, to be graduated 
from high school, a student must take 
four years of English, three years of 
math, two of science, three of social 
studies, and must meet a fine ans 
requirement. At the same time, we 
eliminated lower-level elective 
courses in English, math, and science. 
In mathematics, we established a pre- 
algebra/algebra sequence for all 
students, dropping all general, 
consumer, and business math courses. 

As promising as these innovations 
are. by itself, this educational architec- 
ture won*t promise Mojtin the life he 
and his family hope for. As a district, 
we have to guarantee more than 
coursework. We have to ensure that 
Manin encounters mathematics that is 






more than blind caJcularion and 
formula jusgling. However, no urban 
district of our size and diversity has 
the dollars to guarantee diese 
outcomes single-handediy. To provide 
excellence for all demands partner- 
ships. We have to build on the stan- 
dards the National Council of 
Teachers of Mathematics has devel- 
oped, and we have to join hands with 
the social and natural sciences, as well 
as technology, to figure out the "bi" 
Ideas" we ought to be concentrating 
on. But most critically, partners can 
help us think about the minute-by- 
minute invention of actual courses that 
can enable .Martin— not merelv reme- 
diate him. 

A Push-Pull Strategy 

If you say "College Board." most 
people think of an elite gate-keepino 
organization that decides who should 
go where with how much scholarship 
money. .Not so. For the last decade, the 
College Board has been an active, 
vocal panicipant m school reform". Ten 
years ago. the board published 
Academic Preparation for College to 
mtorm students, teachers, and families 
about the necessary pathways to post- 
secondary education. In the ensuing 
year-s. the Educational EQuality 
Project (E for equality.(2 for quality) 
tleveloped workshops and publications 
to get the word out that more students 
deserved to attend, and could flourish 
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in. college. In a second decade, the 
College Board has launched even 
bolder steps that add up to what has 
been called a "push-pull" strategy for 
major school reform. For example, the 
board, working with major educational 
foundations and a national consortium 
of researchers and teachers, has devel- 
oped EQUITY 2000-3 demanding 
program of pre-algebra. algebra, and 
geometry designed to ensure that 
minority students thrive in vigorous 
high school mathematics programs 
If EQUITY 2000 accounts for the 
"push" of this strategy, then the 
College Board's Pacesetter initiative 
a-xounts for the "pull." Throuah this 
program, the College Board is" 
devoting major resources to determine 
how to make the high-standards 
curriculum, strong teaching, and 
performance assessment, lona associ- 
ated with its Advanced Placement 
Program, a part of every high school 
student's experience. 

In San Diego, we have long used 
the AP Program as an equitv tool. 
Unlike s 'fted and talented programs, 
these courses do not require cutoff 
scores or special certification: any 
willing student can enroll, and any 
teacher can take up the challenge of 
teaching a rigorous and inventive 
course. Characteristically, such 
courses focus on ideas and concepts 
and on helping students display their 
understanding in performance as.sess- 
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ments (for example, applying physics 
t pnnciples to a novel situation and 
predicting possible outcomes). A? 
teachers often form professional 
groups, exchanging syllabi and 
teaching strategies and acting as 
readers when the open-ended portions 
of exams are graded. Not surprisinajy 
we have found these courses work 
toward equity, not elitism. They turn 
out to be laboratories for thinking 
through how excellent work might be 
demanded of a foil range of our 
students. 

Consequently, when the College 
Board proposed Pacesetter, we w^ere 
more than interested. The project 
called for developing yearlong courses 
and associated assessments, along 
with detailed plans for teacher 
training, in mathematics. English, 
woridhistory. science, and foreign 
.'anguage. Some courses would be 
keystones designed to integrate and 
deepen what students had learned 
throughout high school. For instance, 
in 12th grade science, students mioht 
conduct projects about complex issues 
that involved the merging of concepts 
and problems from earth science, 
biology, chemistry, and physics ( for 
example, situations in which the 
chemical composition and the direc- 
tion of flow affect how toxic waste 
takes its toll on the plant and animal 
life in a particular ecological niche). In 
12th grade English, students might 
draw on their reading and insights 
from American. British, and worid 
literature to trace the evolution of liter- 
ature written in English from its 
origins to the present. 

Other comersto..j courses, such as 
those in intermediate Spanish and 
worid history, would suggest the kinds 
ot knowledge and skills students 191 
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should have midway through their 
high school careers. These worthwhile 
outcomes that addressed the chronic 
problem of differential access to 
knowledge would be worked on with 
national committees of skilled 
teachers, researchers, and members of 
national curriculum organizations. At 
the same time, as part of Pacesetter, 
we would be linked to six quite 
diverse pilot sites: Broward County, 
Florida; Prince George's County, 
Maiyland; Battle Creek, Michigan; 
Charlotte-Mecklenburg, North 
Carolina; Irving, Texas; and Rutland, 
Vermont. 

Fram Deciaratl0i to Reallzatiii 

San Diego already has a history of 
innovation and a wealth of parmers. 
Why take on more? 

We are in the midst of a vigorous 
national effort to set standards. We 
have national educational goals for the 
year 2000. The National Council of 
Teachers of Mathematics has 
published widely regarded content 
standards. Social studies, foreign 
language, arts, and language arts 
teachers are headed in the same direc- 
tion. Clearly, there is no short^.ge of 
statements about what we ought to do. 
What we lack is a clear, concrete 
vision of how to reach those goals. 
The issue for' us as an urban district is 
not more declaration: it is realization. 

Pacesetter is centrally about realiza- 
tion. At this moment, national 
committees of classroom teachers are 
designing specific course frameworks. 
English teachers are hotly debating 
how to give students entry to the 
major ''cultural conversations'' of our 
evolving culture. They aie deliberating 
how to provide a background knowl- 
edge of writers like Shakespeare, 
without ignoring the fact that contem- 
porary performances of Othello — set 
in Haiti or Los Angeles — could give 
new meaning to the play. Mathemati- 
cians are struggling to design a course 
that can offer pre-calcuius students 
what they need and teach other 
students how to be critical consumers 
and become skillful at quantitative 
reasoning. World history teachers are 
^^appling with how to use the 



concepts of climate, migration, and 
technology to make the study of 
history increasingly more global. 
Each Pacesetter course will include: 

1. an ouriine of subject content and 
anticipated learning outcomes devel- 
oped by leading teachers and special- 
ists from professional subject-matter 
associations and universities (for 
example, in the case of mathematics, 
the National Council of Teachers of 
Mathematics and the Mathematics 
Association of America); 

2. teacher-training and support 
activities keyed to the content outline 
for each course, including in-school 
assessment techniques, summer insti- 
tutes, workshops, and publications 
illustrating successful approaches to 
teaching diverse students; 

3. classroom assessments that help 
teachers monitor and shape instruction 
while providing ongoing feedback to 
students: 

4. end-of-course assessments (such 
as projects or portfolios); 

5. a valid system for scoring end-of- 
course assessments on a state, 
regional, or local level. 

But realization — even at this early 
stage — has to get beyond lists of 
ingredients to new visions of learning, 
collaboration with teachers, and 
assessment. 
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Learaiii Ontcanes fir Stitfeits 

Although the dust has hardly settled 
on the outcomes for Pacesetter English 
12, early collaborations between the 
College Board and the National 
Council of Teachers of English are 
sketching a lively picture of what's to 
come. Students will read both classic 
and modem works in order to under- 
stand how we have framed and 
currently think about major human 
issues. Literacy, in this context, 
becomes not just the ability to decode 
and record, but to interpret and create 
a wide range of cultural texts — 
speeches, performances, written litera- 
ture, documents, and even films. 

At the outset of the course, students 
might introduce themselves, then play 
back what they have said about them- 
selves and their lives — ^analyzing how 
words, images, and performances 
create specific impressions. Turning 
from their own oral expression, 
students will read short works from 
literature written in English, exam- 
ining similar issues of self-presenta- 
tion and representation through 
language. Moving on to larger works, 
students might read and watch produc- 
tions of The Tempest, thinking about 
how self, familiar, and other (Pros- 
pero. Miranda, Ariel, and Caliban) are 
created through their own speech and 
what others say of them. 
Working in independent 
reading groups, students 
will investigate this 
legacy by looking at 
works as diverse as 
Othello ovToni 
Morrison's Beloved. 
Throughout the course, 
students will explore 
focal works that have 
shaped the way English 
speakers make sense of 
the world, British works 
as diverse as The Tempest 
and Heart of Darkness. 
American works that 
could range from early 
settlers' journals to The 
Adventures of Huckle- 
berry Finn — as well as 
African, Caribbean, and 
Indian literature. 



Throughout, students will take on the 
active roles of authors and critics, in 
addition to the familiar role of reader. 

New OpportyiiUes for Teachers 

The 12th grade mathematics course 
focuses on what happens when we 
confront complex quantitative data 
sets with the need to understand 
patterns, continue research, or reach 
conclusions. In this setting, teachers' 
roles shift dramatically. They become 
researchers constructing rich **case 
studies" in which linear, exponential, 
and logarithmic functions can be 
applied to problems in fields like 
industrial design, economics, and 
demographics. For example, one 
member of the mathematics 
committee has proposed that smdents 
use mathematics to model the impact 
of major historical events. For 
instance, one problem might be '^How 
different would contemporary Europe 
be if the Black Death had not 
occurred?'' 

Teachers are also designers, as they 
try these novel, more demanding 
approaches with students and assess 
how the materials work with a full 
range of students. What, for instance, 
does it take to get a student with a 
shaky mathematics background to 
apply reasoning capacities and ques- 
tioning abilities he or she may have 
developed elsewhere? 

Already by the summer of 1 993. 
mathematics teachers from all seven 
sites will address the question of 
teachers' learning. Joining with 
teachers from the College Board's 
EQUITY 2000 project, they will 
examine what teachers need to know 
in order to become strong coaches and 
diagnosticians for students working in 
challenging mathematical environ- 
ments. Subsequently, participants will 
assume the dual roles of instructor and 
critic, as they field-test a proposed 
sequence of applications that call for 
simple linear through complex loga- 
rithmic functions. 

What is emerging from these 
efforts? A radically different view of 
professional development— no shrink- 
wrapped, teacher-proof materials to be 
swallowed whole the night before. If 



teachers are to become inventive users 
of the course frameworks and skilled 
assessors of student work, they must 
be actively involved in all stages of 
implementation. 

New Questions About Assessment 

Two conflicting purposes often criss- 
cross assessment programs: the 
responsibility to use any assessment 
to respond to smdent work and 
encourage growth and the demand 
that assessment provide reliable, 
quantifiable information about smdent 
learning. As a nation, we have a long 
history of downplaying the first and 
highlighting the second. So consuming 
has our demand been for account- 
ability data that we have often allowed 
rote and short-answer testing formats 
to obscure the potential richness 
of assessment. But if students like 
Martin are to realize their dreams, we 
need a more complex view of smdent 
assessment. 

Pacesetter will allow urban districts 
like San Diego to take part in a 
broader national discussion about 
combining these two aspects of assess- 
ment. While we clearly want to value 
authentic work and acknowledge 
student growth, as a school disoict, we 
also have serious obligations to 
conduct student and program assess- 
ment responsibly. As we move toward 
more open-ended and authentic forms 
of assessment, no one should be 
allowed to fall through the cracks. 

Moreover, as our approaches to 
assessment move in this direction, 
serious questions arise about equity 
and costs. Fortunately, Pacesetter 
allows our teachers to work with an 
extensive team of researchers and 
assessment experts from Educational 
Testing Service. They are proposing 
new ways of combining our need to 
assess students' knowledge with our 
interest in recording their progress 
toward valued outcomes. 

Unanswereri Questions 

Many questions about Pacesetter are 
still unanswered. Present the program 
to teachers and administrators, and 
many hands fly up. People want to 
know: 
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1 . When fewer than half of our 
students sign up for fourth year math 
or science, how can we get all 
smdents to a level where they can take 
Pacesetter courses in 10th or 12th 
grades? 

2. Particularly in hard financial 
times, how will we give teachers the 
time they need to teach and sustain the 
extra demands of Pacesetter courses? 

3. How can we use Pacesetter 
courses— which are still taught within 
traditional subject-matter bound- 
aries — to move toward a more inte- 
grated high school experience? 

4. Pacesetter courses are supposed 
to be designed for all smdents. How 
will we include smdents with weak 
academic histories, special education 
needs, or languages other than English 
in such demanding courses? 

5. The College Board produces 
other forms of smdent testing, such as 
the -SATs and the Achievement Tests. 
How will Pacesetter's more open- 
ended approach to smdent assessment 
affect these other tests? 

There are no simple answers. Pace- 
setter is a ^*work in progress/' just as 
the College Board is involved in 
rethinking its mission as a major 
educational institution. At the turn of 
the century, it was a tremendous move 
toward equity to insist that all smdents 
be eligible for college on the basis of a 
common exam. No longer could your 
last name, and your father's occupa- 
tion and education, be the gatekeepers 
to education after high school. A 
hundred years later, we have learned 
that equity demands additional tools. 
We cannot claim to have ''done our 
job'' when we have not offered 
instructional and assessment opportu- 
nities that prepare students for college 
or the worid of work. In that light, we 
are going to have to reinvent our 
means. Pacesetter provides one labora- 
tory in which to do so. ■ 



Thomas W. Payzant is Superintendent 
of San Diego City Schools, 41 00 
Normal St., San Diego, CA 92103. 
Oennie Palmer Wolf is Director of the 
Performance Assessment Collaboratives 
for Education (PACE) Project, and Senior 
Research Associate. Harvard University, 
8 Story St., Cambridge, MA 021 38.^^3 ' 
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"Ensuring Reliable Scoring" 

A Chapter in A Practical Guide to Performance Assessment^ 
by Joan L, Herman, Pamela R. Aschbacher and Lyxm Winters, 
Association for Supervision and Cm^riculmn Developnaent, 1992 

In A Practical Guide to Performance Assessment, the authors offer guidance on the 
creation and use of alternative measures of student achievement. They present a process model 
that links assessment with curriculum and instruction based on contemporary theories of 
learning and cognition. 

The chapter reproduced here, "Ensuring Reliable Scoring/' emphasizes the fact that a 
fundamental feature of performance-based assessment is its reliance on human judgment. As 
any trial lawyer will attest, two people viewing the same occurrence or reading the same docu- 
ment often come up with conflicting perceptions or interpretations. Likewise, persons viewing 
the same behavior on different occasions may arrive at different judgments about that behav- 
ior. This chapter is intended to help developers minimize such differences by developing sound 
scoring procedures. 

This work was supported by the Office of Educational Research and Improvement, U.S. 
Department of Education, Cooperative Agreement Number R117G10027 and CFDA catalog no. 
84.117G. Copies of the book may be ordered from: 

Association for Supervision and 
Curriculum Development 
1250 N. Pitt Street 

Alexandria, VA 22314 ' 
(703) 549-9110 

Price: $10.95 

ASCD Stock Number: 611-92140 
ISBN: 0-87120-197-6 



Copyright, 1992 by the Regents of the University of California. 
Reprinted with permission of the Regents of the University of California. 
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Ensuring Reliable Scoring 



A fundamental feature of performance-based assessment is its reliance 
on human judgme-t. As any trial lavwer will attest, two people viewino 
the same occurrence or reading the same document often come up with 
conflictmg perceptions or interpretations. Likewise, persons viewing the 
same behavior on different occasions mav arrive at different judgments 
about that behavior. The user or developer of alternative assessments 
must seek to minimize such differences: otherwise the measures cannot 
be fair, consistent, or valid. Sound scoring procedures help the process 



Understanding the importance of 
Reliability and Consistency 

The most obvious reason for consistent scoring is equitv. To be meanino- 
ful. judgments of student performance cannot be capricious You need 
to have confidence that the grade or judgment was a result of the actual 
performance, not some superficial aspect of the product or scoring 
situation. Was Yukis grade undulv influenced bv her spelling? Did Mark 
get a better (or worse) grade because his project was graded near the end 
when you were tired? How was famars grade affected bv the fact that 
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another teacher did part of the scoring? What about Coriime? Did she fail 
the competency writing test this year because the raters were more 
stringent than last year? 

Inconsistency is especially troublesome when the results influence 
important decisions about students or programs. What grade does Den- 
isha deserve? Should Marta be allowed to take the Advanced Placement 
English class despite low standardized test scores? Should the school's 
new math program continue? Even when the results of a single assess- 
ment do not carry high stakes, inconsistency means inaccurate scoring. 
More to the point: inconsistent scoring means the scores have little 
meaning. If an A" doesn't consistently represent excellent performance, 
then what does it mean? The best in the class? The best of a poor lot? 
Improved effort? If a performance or project receives different scores 
from different judges, what does each really mean? Which one is accu- 
rate? If you apply criteria differently depending on how long youVe been 
scoring, what does the final set of scores mean? What does an individ- 
ual's score mean? 



Achieving Consistency 

Equitable and meaningful scoring requires informed and consistent 
judgment. How do you avoid capricious subjectivity? As we discussed 
in Chapter 5. having well-defined and defensible criteria for judging 
student performance goes a long way toward achieving consistent scor- 
ing, but there are other conditions that must be met to ensure consistent 
scoring. Firsts those making judgments — you. teacher colleagues, the 
state department of education — must thoroughly understand the criteria 
in a similar fashion. A consensus among raters about the meaning of the 
criteria and how they are to be applied builds the foundation for scoring 
consistency. Second, you need a system for monitoring the consistency 
of ratings over the period in which performance is being judged. This 
consistency has several facets. Two or more judges rating the same 
performance should have general agreement. One judge should rate a 
particular performance in much the same way regardless of when it is 
observed — whether during the beginning of the day. somewhere in the 
middle, oi near the end. Judges should rate the same performances 
similarly on separate occasions. And. the same performances rated on 
two separate occasions by two different group of judges should be rated 
similarly. If your scores are used to make high-stakes decisions such as 
promotion, graduation* or special class placement, you should formally 
document evidence of scoring consistency. 

O 198 

tKIC 21C 



A PRAaiCAL GUIDE TO ALTERNATIVE ASSESSMENT 



Professional Development Benefits 

The process by which judges learn to apply scoring criteria in a consis- 
tent manner can provide a valuable opportunity for professional devel- 
opment. Rater training helps teachers come to a consensued definition 
of key aspects of student performance. This can lead to a reprioritization 
of classroom goals as well as insight about the strengths and weaknesses 
of their students' performances. The scoring process can provide a model 
for classroom assessment and encourage more collaboration among 
teachers in the appraisal of student outcomes. 

To reap the benefits of consistency and professional growth, you will 
need good training procedures and a carefully structured rating process. 
This chapter outlines major considerations in devising and implement- 
ing a valid scoring procedure. Although the process we describe has its 
origin in formal, high-stakes assessments at the district and state level, 
keep in mind that consistent scoring applies to all forms of assessment, 
be they classroom grades or college admissions. Decisions about a 
student can't be valid unless based on reliable information. 

Rater Training: 
A Prerequisite for Consistent Scoring 

There are a number of ways to achieve consistency. Our approach 
emphasizes training raters to a common standard because this approach 
is efficient and j-^ovides teachers with instructionally useful informa- 
tion. Other approaches devote less attention to rater training and con- 
sensus-building and rely on multiple judgments of student work to 
achieve a similar result. As you might expect, the approach you choose 
depends on your assessment purpose and available resources. 

During rater training, judges learn what the scoring criteria mean, 
what aspects of performance each is intended to capture, and what each 
of the scale points represents. It is during the training session that you 
make sure raters apply the criteria consistently to a range of student work 
samples. This is also the time when raters learn how to record their scores. 

Training Manuals 

Formal scoring manuals can be very helpful both during and after 
training. For large-scale assessments, such as yearly district or state 
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testing programs, a scoring manual provides an "institutional memory" 
of assessment procedures and serves as a useful reference for interpreting 
scores. For hi^-stakes classroom assessments, such as Advanced Place- 
ment "screening" examinations, or an algebra readiness test, scoring 
manuals can be useful in discussions with parents or students who want 
to know how scores are achieved or improved. Typical scoring guides 
include: 

■ Fully explicated scoring criteria; 

■ Examples or models illustrating each scor^ point; 

■ An abbreviated, one-page, version of the criteria or reference 
during actual rating; and 

■ A sample form for recording scores. 

You might want to review training manuals from several sources before 
designing your own rater training. If you are interested in a detailed 
description of the rater training process, a complete scoring manual 
developed by the Riverside Publishing Company appears in Educational 
Performance AssessmenU edited by Fred Finch (1991). State depart- 
ments of education are also sources of published scoring manuals. 



Training Procedures 

Actual rater training is designed to create a consensual understanding 
of the scoring criteria, provide extensive practice in actual scoring, and, 
in the case of high-takes assessment, document acceptable levels of 
scoring consistency (reliability). During rater training, practice scoring 
sessions provide raters immediate, substantive feedback about their 
judgments and ample opportimities to ask questions. Raters also come 
to understand that their job is to make a judgment based on the scoring 
rubric, not to revise or criticize the rubric and then follow their own 
inclinations. Without such an imderstanding, an entire assessment en- 
terprise can be sabotaged. 

A typical training session includes: 

■ Orientation to the assessment task. Raters receive an overview of 
tile assessment context, what the results will be used for, who will 
use them, what directions and prompts the students received, and 
how the scoring guide operationaHzes desired outcomes or proc- 
esses. It is common to ask raters to actually take tiie test as a means 
of orienting them to the scoring task. 
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■ Clarification of the scoring criteria. In this phase of training, 
raters engage in extensive discussion. Both the criteria dimen- 
sions and scale values are defined and a range of models provided 
to exemplify each. Discussion often moves from simpler judg- 
ments, such as which samples illustrate high, medium, or low 
performances, to more difficult distinctions required for assign- 
ing numerical scores. 

■ Practice scoring. This is the heart of the rater training process. At 
first, sample assessments are scored one at a time with discussion 
following each paper. As raters become more fluent \/ith the 
scoring guide, they get opportunities to exercise more difficult 
judgments with problematic (atypical) or borderline assessments, 

■ Protocol revision. During the discussion and practice scoring, 
raters naturally devise certain rules for dealing with the unantici- 
pated aspects of judgment posed by a particular set of papers and 
not covered by the scoring guide. For example, when almost everv 
student has misinterpreted the test prompt in the same fashion, 
rather than to score all answers as "off topic" or "unacceptable." 
raters may decide to assign scores based on the student-defined 
task. Or, if many traits are to be scored, raters may decide that 
different raters should specialize in scoring a few" of the traits 
rather than having all raters score every sample on everv dimen- 
sion. 

■ Score recording. For all assessments, student scores must be 
recorded in some fashion, on the roll sheet or on summary sheets 
for a classroom, grade level, or school. Rater training covers the 
format for recording scores and any special procedures for calcu- 
lating student scores such as averaging and totalling across di- 
mensions. 

■ Documenting rater reliability. Rater training ends when there is 
agreement that scorers have reached an acceptable level of con- 
sistency, usually rating sample pieces within one point of each 
other. In order to determine when raters are readv for the real 
thmg, reliability checks are conducted during training. Figure 6.1 
provides an example of how to check rater consistency using the 
percent agreement method, 

■ Schedu ang Considerations. How much time will it take to train 
raters to an acceptable level of agreement before letting them 
judge student work? It depends on: 

— Hovv experienced your raters are. 

— Whether they are familiar with your scoring criteria. 
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— How quickly raters come to consensus about the meaning of 
the criteria. 

— The complexity of the scoring criteria, and the quaHty of the 
work to be judged— with borderline work being the most 
difficult to assess quickly. 

We have foimd that it takes about three to four hours to train raters to 
use a holistic or simple (two- to four-tiait) analytic scale. More complex 
scales can require up to a full day of training. 

Rater fatigue is an important factor in scoring; we consid-jr a six-hour 
session a full day's work. You should also schedule tim 3 for retraining 
or refreshing raters at the beginning of each new scoring day, and 
certainly for any changes in topics or tasks that use the same scoring 




Figure 6.1 

Calculating Rater Agreement 
(Three raters for two papers) 
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Total 


67% =r yes 


33% = yes 


:>0% 


100% = 
ves 
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83% 



Figure 6.1 illustrates the case in which three raters are asked to rate two criterion papers 
after some training. According to the results in the figure, Linda agrees with the criterion 
score tor paper 1 but not tor paper 2: in tact, for paper 2 she Is not even within one point 
ot the critenon score. Robert is not in perfect agreement with the criterion scores on either 
paper 1 or paper 2 but is in agreement plus-or-minus one score point on both papers Ella 

''JIL^^'^r^"^ ^'""^ ^""^ '^^^^ '^^^ ^^"^^"^ ^^^"^ and Linda probably 
need a little more training. Paper 2 causes more problems for raters than paper 1 so 
further training should focus on distinguishing the criterion score from neighboring scale 
points. In reportmg these results you could say, "On average, raters obtained perfect 
agreement with criterion scores 50 percent of the time, and reached il agreement 83 
percent of the time. 
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criteria. In high-stakes assessment, retraining often takes place after any 
lengthy breaks such as lunch. ^ 

Training Paper Issues 

Because rater training provides a dry run for actual scoring, it behooves 

Tec hi ""^y °f disagreement as 

possible before rater teaming and to build opportunities into the training 
papers for elicitmg disagreement and discussing it. For example, the 
syntactical constructions used by non-native Enghsh speakers raise 
issues related to balancing content with communication concerns. You 
should also deal with handwriting and legibility issues or aesthetic 
quality concerns in visual and performing arts. Finally, you want to be 
sure that the sample papers you select for training represent not only 
each point on the score distribution but also the entire range of student 
performance likely to be encountered in scoring. The natural human 
tendency is to grade normatively. The better work samples from a set of 
relatively poor papers may receive higher scores than thev would were 
they part of a stack of relatively good papers. The reverse can also be the 
case. This tendency should be discussed during rater training with 
examples provided so that the scoring criteria maintain the same mean 
ing across different sets of papers and different scoring occasions. 

Obtaining Sample and Check Papers 

Because a wi^de array of sample work is needed to guide raters, vou 
should collect samples from a diverse group of students. Pick work from 
a he d-test, a previous assessment, or from the actual assessment To 
Identify appropriate training and check papers, a group of "experts"- 
teachers from the grades and subjects involved who are familiar with 

that Illustrate the range of responses, from clear to borderline, for each 
score point so that raters will be trained to handle all situations. If several 
prompts or ..s. s are used in the assessment, examples need to be drawn 
Z'Z^' /°u 7 ''''''^ t^e-related scales across grade levels, vou need 
examples to illustrate each age level. It is also useftil to prepare comment 
sheets explaming how the specific aspects of each piece of work repre 
sent criteria for a particular score. The expert group can thin i dent L 
samples that will be used for (1) training discu'ssions.S practice and 
(3) checking consistency. ^ 
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Score Recording Concerns 

You need to provide raters a method for recording student scores. In your 
own classroom, you might simply record scores at the top of the student's 
paper and then in your roll book. Some teachers use the scoring criteria 
as a feedback sheet for students. They circle deficient areas or note 
strengths using the descriptors on the guide. The same process can be 
used to create a classroom profile on one master scoring guide. 

In more formal assessment settings, score sheets become a matter of 
public record and are used to provide feedback to teachers and others. 
Data analysts also use them to calculate test statistics. In these instances, 
raters are^ often given machine readable documents for "bubbhng" in 
student scores as well as other important information such as the school, 
district, student, and rater identification numbers and the code numbers 
for topic or task and date. Whenever you have two or more raters scoring 
student work, you'll need to remind them not to indicate scores, com- 
ments, or corrections on the sample itself. You don't want a subsequent 
rating influenced by their comments. 



Reliability Issues 

The purpose of rater training is to create consistent, reliable scoring 
procedures. Thus, a method of determining if raters are consistent 
should be built into the training period. Many strategies for checking 
rater reliability exist. One commonly employed approach is to prepare 
in advance air^ score a set of ten or so "reliability check" papers 
representing the range of student performance. Ask the raters to score 
this same set and compare their judgments with you or others who are 
trusted assessors. Reasonable agreement with both the expert judgments 
and with each other suggests tliat raters are ready to score actual student 
work. 

What constitutes reasonable agreement? You can ask that all raters be 
in exact agreement before you consider them reliable, or you can use the 
less stringent "plus or minus one" rule, which is fairly common and says 
that raters are "in agreement" if they agree within one scale point, "plus 
or minus." For example, if the score on a particular rehability-check 
sample is a "3," anyone who gave it a rating of "2", "3," or "4" is 
considered to be on target. 

Regardless of the target level of agreement you choose> when you train 
raters, the goal is to have them apply the scoring criteria exactly as 
intended, not to within one scale point of the target score. When a rater 
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has difficulty applying tJie criteria exactly as intended, you should spend 
time during training discussing the practice papers, criteria, and deci- 
sion rules for applying the criteria in order to bring the rater up to an 
acceptable level of consistency. However, some raters may not be able to 
adjust their internal criteria to match the scoring guides. These aberrant 
scorers should be dismissed or assigned to other tasks during actual 
scoring. 

In addition to deciding how close ratings should be to establish 
consistency, you need to think about how often they need to be in such 
agreement. If you are asking for exact agreement, which can be difficult 
to obtain, your criterion for reliability may be less stringent than if you 
are using the "plus or minus one" rule. At CRESST, we often ask that 
raters agree with the experts at least 90 percent of the time on each 
scoring dimension when using the "one point off' guideline. The guide- 
line for exact agreement could drop to 75 to 80 percent under the more 
stringent condition. The actual percentage of agreement varies depend- 
ing on the assessment purpose and stakes involved. 

Regardless of how you deifine "rater agreement," the purpose of 
reliability checks is to ensure that student scores aren^t the result of 
capricious judgment, one of the most commonly cited arguments against 
performance assessment. Consider the classic study conducted by Paul 
Deidrich (1963) at the Educational Testing Service in which the same 
essay was assigned an entire range of scores by a group of raters. What 
most don't remember about this study is that acceptable levels of rater 
agreement were obtained when the judges (1) were drawn from the same 
discipline, (2) used explicit scoring criteria, and (3) participated in a 
training session. 



Ensuring Equitable Judgments During 
an Actual Scoring Session 



Maintaining Consistency 

Documenting rater consistency diiring training is simply the first step 
toward creating a fair, equitable scoring process. Because the purpose of 
rater training is to develop rater consistency, you need to monitor rater 
scoring patterns during ihe actual scoring process as well. Research 
shows that raters have a tendency to drift away from formal criteria to 
their own. more idiosyncratic views (Quellmalz and Burry 1983). Hu- 
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man judgments and expectations are shaped not only by formal stand- 
ards, such as scoring criteria, but also by their prior experience and the 
actual range of performance currently being assessed. If the entire set of 
performances appear to be relatively "poor" according to the objective 
criteria, raters develop a tendency to shift the criteria downward so they 
can award higher scores to the "best of the worst" papers. As a teacher 
you too have perhaps been aware that your standards and expectations 
for students change during the grading process. You modify your ideas 
somewhat after looking at several pieces of student work. For this reason 
trainmg sessions need to include a large sample of papers and the entire 
range that might be encoimtered during actual scoring. 

For classroom assessment purposes, you can check your consistency 
by stopping midway and rescoring some of the first student work vou 
scored. When you are scoring several different dimensions or topics you 
can score all work on one dimension or related to one topic at the same 
time, then go back and score for other factors. Scoring all papers several 
times, once for each different dimension or topic, is often quicker than 
going through individual papers for everything at once and applying 
muhiple criteria or reading different kinds of responses. Your scoring 
pace also increases as you become familiar with the criteria 

For school-level, larger-scale, or high-stakes assessment, you'll want 
to build in more formal rater consistency checks. For essay scorino this 
IS sometimes done by burying pre-scored common check papers at 
designated intervals in each rater's stack" of papers. The scoring director 
then checks raters on the common paper and works with those who have 
dritted away from a consistent application of the scoring guide. Another 
method IS to conduct mini-training sessions first thing in the morning 
or right after lunch. Raters score a common set of check papers, much as 
they did in training. Those who have drifted from the preset standard 
(exact agreement: plus or minus one point) participate in a review 
session and are rechecked before being allowed to continue scoring. 

An additional consistency consideration in large-scale assessment 
relates to lack of bias in rater judgments. You need to be sure that raters 
working together don't form subgroups who agree with each other but 
not all the other participating raters. To avoid this, break up rater groups 
at periodic intervals and have second ratings of papers/work done bv 
raters assigned to other tables or physical locations. 
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Managing Logistics 

Although achieving consistent judgment is the overriding concern of 
scoring, conducting a scoring session involves a number of logistical and 
technical issues. Scheduling is one of the most fundamental concerns in 
planning a scoring session. As people tend to tire in the afternoon and 
rate more slowly, you might consider scheduling your rating sessions 
early and avoiding the late afternoon. Access to a copy machine enables 
you to address any imanticipated shortages of rating materials or to 
reproduce papers that require discussion during the rating session. 
Further, rating is an intense activity; provide frequent breaks and snacks 
(lots of fruit and carbohydrates, little sugar). The scoring area itself 
should be quiet and comfortable with ample room for raters to accom- 
modate the work to be reviewed. A rater's nightmare is to work in the 
gym on folding chairs and tables at 3:30 on a hot May afternoon during 
band practice. 

Another concern is managing the flow of papers or other student 
products, hi large-scale assessments, each table of scorers should have 
their ovra leader whose sole duty is to manage the paper flow and 
monitor rater consistency. Our experience suggests that bundles of 
student work that take about one hour to rate are easier for raters to 
handle than individual pieces. The number of pieces in each bundle will 
vary with the nature of the task and the complexity of the scoring scheme. 
In v^nriting assessments, for example, sets often consist of fifteen to 
twenty-five papers, whereas a bundle of portfolios migbl include only 
four to six. Regardless of how work is bimdled, individual pieces must 
be randomly assigned to bundles and bundles randomly assigned to 
raters so that no systematic scoring effects occur. For formal assessments, 
both raters and students should be assigned identification numbers to 
guard against bias and protect privacy. 

You'll need to decide whether to mix different grade levels or differ- 
ent topics together in the same scoring session. Generally, this is not done 
unless the purpose of an assessment is to compare students at different 
grade levels on the same scoring scale. In large-scale assessments, 
different topics are either assigned to different rater groups or scored 
separately from each other with a session of refresher training preceding 
the topic change. 

Another concern that can cause problems later if not monitored 
carefully is ensuring that scorers are recording required information 
property. Were all identification numbers bubbled in along with the 
scores? Were scores recorded for all papers rated? Do all students have 
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scores? The list is extensive. Try to anticipate what can go wrong and 
devise strategies for either preventing it from happening or for fixing it. 



Ensuring Technical Quality 

Advice on all the technical decisions you have make to ensure scoring 
accuracy and eqviity is beyond the scope of this book and in fact 
constitutes a psychometrician's career. If you are assessing for a high- 
stakes decision, especially if that decision can get you sued, disparaged 
on page one of your local newspaper, or called before the board of 
education, you may want to bring in a technical consultant to structure 
yoxir scoring process and help you docxmient the reliability of student 
scores. Following are some of the questions you need to address: 

How many raters are needed? This, of course, depends on how much 
work is rated, how many ratings each piece will receive, how long it 
takes to rate each piece, and how many days are available for scoring. 
Holistic scoring of one-to-two page essays generally goes quickly, some- 
times as quickly as a minute a paper. A complex analytic rating on longer 
pieces can take fom to five minutes per paper. Portfolios can take longer 
still. As for the number of days, oxir experience suggests raters can get 
quite burned out after iom or five days. 

How many scores per paper? Effective training and vigilant monitor- 
ing of the scoring process can eliminate much of the need to do multiple 
scoring of the same dimension of student work. Multiple raters are 
needed for each paper when raters are inexperienced or there is little 
evidence that raters are using the same criteria and standards in making 
their judgments. The need for multiple scores depends on your assess- 
ment purpose. The more serious the consequences, the more important 
it is that you document consistency. Oxir experience suggests that no 
more than two raters are needed for any piece; the ratings can be summed 
or averaged to provide a final score. A third opinion can be called in for 
difficult cases, such as the occasional nightmare paper that draws both 
the lowest and highest score. 

In some situations, one score is sufficient for a majority of the pieces. 
Consider a situation in which selection, placement, or other critical 
decisions about individual students will be made based on some prespe- 
cified standard or cut score. If yom training and scoring check papers 
show that raters are consistent, the only papers requiring two or more 
ratings will be those borderline papers falling around the passing score. 
Because rating is an expensive process, you will need to balance reliabil- 
ity concerns against those for cost and efficiency. 
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How are papers scored for evaluation purposes? If student scores 
will be used for program evaluation rather than Ladividual assessment, 
a reliable estimate of an individual student score is less critical than the 
average score for the task. Most pieces of work can be read only once, 
and your reliability evidence can be obtained on a sample of work 
(perhaps 20 percent), w^hich is rated by two or more raters. If you are 
using student samples to evaluate a program and don^t have to provide 
individual scores to teachers, it is more efficient to score a randomly 
selected sample of student work. Your technical consultant can advise 
you about sample size and the appropriate manner of selection. 



Providing Evidence of Reliability 

For high-stakes assessments, you need to formally dociunent the consis- 
tency and reliability of your scoring process. Plan to invest in the services 
of a technical expert in advance of the scoring to ensure that you have 
an adequate scoring design, that you are collecting suitable evidence, 
and that your data are appropriately formatted to ease data analysis. 
The following are some relevant sources of evidence: 

■ Results of the qualifying check after training. Plan to report on 
what agreement level was required. What proportion of your 
racers passed on the first try? What was the average level of 
agreement among those passing? 

■ Results of the consistency check during scoring. Plan to report 
on what agreement level was required. How many and when were 
the checks made? What proportion of your raters passed without 
remediation? What was the average level of agreement on the 
checks? 

■ Inter-rater reliability results for student work scored by more 
than one rater. Percentage agreement among raters and generaliz- 
ability coefficients are two frequently used techniques. Each of 
these is calculated separately for each scale you use. As a guide, 
you need to double score at least 20 percent of yoiir student 
samples to get sufficient evidence, and if more than two raters are 
involved, you need to consult a statistician for help with a 
balanced design specifying which raters are to score which pieces 
of student work. 

What level of agreement or reliability is high enough? Of 
course the answer is: it depends on the decisions you are making. 
The more critical or restrictive the consequences are, the more 
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reliable your scores need to be. In general, reliability coefficients 
of ^70 and above are considered respectable. Coefficiente of 90 
and above are not uncommon with stLdardized muSpte choice 
tests, and large-scale direct writing assessments. ^ 
Rater consistency across years. When you want to be sure that 
your ratmg scale is consistent from yei to year-for eSLple 
when results are being used in state assessments to LSds' 
over tmi^you need to include witi this year's scori^a suffi 
cient sample of s^dent work from last year's scorSgTLemen 
m scores assigned can then be checked; and if necTSarf statiS 
cal adjustments can be made for differences. °'''''^' 
Rater coi^istency across different locations or different grouns 
of raters. Snmlar to checking consistency across yZ7^iSS 

n °^ ""''^ ^be^k on the consistency 

of these differrnt groups. For example, a state might convene foS 
regxona.1 wori^shops to score its hands-on science SsessmentsT 
a district assessment might require each schooTto scS^itf 

s edte • °"!,r^ '° ^^^^^ consistency would be^ 
seed the work scored by each group with a common set of work 
At scoring site one. for instance, scorers would assess studet 
work assigned specifically to site one plus the common sit siS 
wo scores would assess student work assigned spedficaUy to s te 

r £ ?; H°TS"."^ ^° ^--^ on the coSln 2 
can then be checked for consistency 

Inter-rater consistency. This is the degree to which one rater 
remains consistent over time. Check for ^is by havin^^^^^^ 
proceT ^^^^--^ P°-ts in Se scSSg 



Checking the Reliability of 
Your Rating Process 

followi^'SLrrsef 1?^^^ ^"^"^^ ^^^P^-' the 

reliable L you hfve ''^"""^ P^°^^^^«^ «-nd and 

( j documented, field-tested scoring guide 

I ] clear, concrete criteria 

[ 1 annotated examples of all score points 
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[ ] ample practice and feedback for raters 

[ ] multiple raters with demonstrated agreement prior to scoring 

[ ] periodic reliability checks throughout 

[ 1 retraining when necessary 

[ ] arrangements for collection of suitable reliability data 



References 

Baker, E.L. , P.R Aschbacher, D. Niemi. and E. Sato, (1992), CRESST Performance 
Assessment Models: Assessing Content Area Explanations. Los Angeles: 
University of Caiifomia, Center for Research on Evaluation, Standards, and 
Student Testing, 

Deidrich, P.B, (1963). "The Measxnement of Skill in Writing," Scliool ReviewbA: 
584-592. 

Finch, F, (1991). Educational Performance Assessment. Chicago: Riverside Pub- 
lishing Company, 

Quellmalz, E., and J, Burry. (1983). "Analytic Scales for Assessing Students' 
Expository and Narrative Writing Skills," (CSE Resource Paper No. 5). Los 
Angeles: University of Caiifomia, Center for Research on Evaluation, Stand- 
ards, and Students Testing. 



ERIC 



2V9 



211 



CRESST Performance Assessment Models: 
Assessing Content Area Explanations 

This handbook presents a performance-based approach to assessing students' under- 
standing of subject matter content. It is based on years of research conducted by the Center for 
Research on Evaluation, Standards, and Student Testing (CRESST), funded by the U.S. 
Department of Education's Office of Educational Research and Improvement. The handbook 
mcludes: (1) a concise model of alternative assessment for those who need to develop similar 
assessments on their own; (2) examples of successful CRESST assessment materials; (3) an 
effective scoring rubric for performance assessments appUcable to a variety of topics; and (4) 
useful benchmark papers. 



Four parts of the handbook are reproduced here: 

• The Table of Contents 

• The Introduction 

• Chapter 1 : Overview of CRESST Research 

• Sample student assessments in chemistry 



The handbook was written by Eva L. Baker, Pamela R. Aschbacher, David Niemi and Edynn 
Sato with support from the Office of Educational Research and Improvement, U.S. Depart- 
ment of Education, Cooperative Agreement Number R117G10027 and CFDA catalog no 
84.117G. ^ 



Copies are available for $10 from: 



UCLA 
CRESST 

Graduate School of Education 

405 Hilgard Ave. 

Los Angeles, CA 90024-1522 



Reprinted with permission of the Regents of the University of Cahfornia. 



230 



213 



CRESST Performance 
AssESSi^^NT Models: 

Assessing Content Area Explanations 



Eva L. Baker, Pamela R. Aschbacher, 
David Niemi and Edynn Sato 



April 1992 



ERIC 



231 



215 





Table of Contents 



Introduction 



Chapter 1 

Overview of CRESST Research 



Chapter 2 

Guidelines for Using CRESST's Model 

for Assessing Explanation 

Chapter 3 

Sample Assessment Materials for Students 

Chapter 4 

Specifications for Developing Assessment Materials 

Chapter 5 

Rater Training, Scoring and Reporting 

Chapter 6 

Sample Training Materials 



216 

ERIC 




ERIC 



Introduction 



This handbook presents a performance-based approach to as- 
sessing students' understanding of subject matter content. It is based 
on years of research conducted by the National Center for Research on 
Evaluation, Standards, and Student Testing (CRESST), funded by the 
U.S. Department of Education's Office of Educational Research and 
Improvement (OERI). The purposes of this handbook are to: 

• provide one model of alternative assessment for those who 
need to develop similar assessments of their own; 

• introduce successful examples of CRESST assessment ma- 
terials; and 

• facilitate research on other alternative assessments. 

The materials which follow are the result of our fivt year research 
effort designed to explore the development of alternative assessments in 
history. To summarize, the project has attempted to find ways to score 
the content quality of essays in history. Using the writing of expert 
historians as the basis of scoring criteria, we have developed techniques 
for measuring the deep xmderstanding of history and for scoring student 
work reliably. Our work has been conducted using students from grades 
8 through 12 and has been expanded to other content areas as well 
(economics and science). 

These assessment tasks are consistent with cognitive learning 
theory. They include recalling prior knowledge in a content area, 
reading primary source documents containing new information, and 
writing an explanation of important issues that integrates new and 
prior information. 

Our assessment judges student understanding on the basis of six 
scales, including the use of concepts and facts, the avoidance of major 
misconceptions, and the quality of the argument presented. The scales 
were developed from studies of expert and novice performance. We have 
used this assessment approach to research a number of technical issues 
in performance assessment and have demonstrated the reliability, 
validity, and generalizability of this technique. 
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We believe that this assessment could be xiseful for both large- 
scale accountability and diagnostic improvement of instruction. Typi- 
cally, measurement experts have argued that accoimtability and 
diagnosis should be conducted with separate kinds of assessments. 
But for practical, economic, and conceptxial reasons, we argue that 
they can be merged into a single measure, with different methods of 
reporting the data for different pixrposes. 

Inside you will find backgroimd information on our CRESST 
performance-based assessment, examples of assessments for second- 
ary level history and chemistry, and specifications for duplicating our 
technique with other topics and subject matter areas. We also describe 
our rater training process, scoring techniques, and methods for report- 
ing results. 



Interested users may contact CRESST at (310) 206-1532 for 
copies of additional materials, assistance using them in an assess- 
ment program, help in developing assessments for new topics, or for 
technical information about the rating scales. 
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Chapter 1 

Overview of CRESST Research 



Cognitively Sensitive Assessment 

Tests should measure significant learning in a way that sup- 
ports desired performance. This simple concept shotJd lead xis, as 
educators, to a reversal of our present use of standardized tests which 
fail to measure deep understanding of student learning. Instead of 
having tests constrain instruction, assessment procediu*es should 
build directly on learning. 

Despite the widespread interest in alternative assessments, 
there has been relatively Httle research on the design and technical 
quality of such measures. CRESST began conducting research on 
history performance measures in 1988. Focusing on both explanation 
and knowledge representation skills, we have attempted to develop a 
better method for validly assessing secondary students' deep xmder- 
standing of content areas such as history. 

Many current performance assessments are developed with 
minimal design constraints because clearly acknowledged technology 
does not exist for performance task design. Developers seem to focus 
on a few Umits when they create new assessments. One set of 
constraints concerns logistical issues, such as assessment time and 
availability of materials. Another emphasis has been on the surface 
characteristic of the task, that it exhibits motivational or '^authentic'* 
attributes of the assessment. 

Teachers and other developers want assessments that capture 
the imagination of students, intrinsically motivate, and if possible, 
demonstrate relevancy to real-world demands and expectations. Far 
less attention has been paid to design constraints focused on increasing 
the technical quahty and the economic feasibility of the resulting 
assessments. 
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CRESST's research assiimes that a desired goal of performance 
assessment is the generation of •'comparable" tasks for estimating 
student achievement. Our approach has sought to produce compara- 
bility by designing it at the outset rather than adjusting findings post 
hoc through scaling and statistical equating. Specifications to control 
cognitive demands of the task, the structure of the assessments, and 
the gBneration and application of scoring rubrics have been thought to 
produce performance that showed less variability fi*om topic to topic 
than tasks created with fewer design constraints. In our attempt, we 
have tried to control both rater and score reliability. 

Our history performance tasks, which have evolved over time, 
require students to engage in a sequence of assessed steps — ^taking a 
m i n im nm of one-and-a-half hours per topic. First, students are 
assessed on their relevant background knowledge of the particular 
historical period. This measure consists of a 20-item, short-answer 
test with questions to measure student knowledge of historical prin- 
ciples and specific events pertinent. 

Next students are provided with opposing viewpoints in pri- 
mary source text materials, typically letters or speeches of historical 
figures. Finally, students are asked, in a highly contextualized set of 
directions, to write an essay that explains the positions of the authors 
of the texts, and to draw upon their own backgroimd knowledge for 
explanation. In some studies we have given students optional re- 
sources to read, or have asked students to prepare HyperCard or 
concept map representations of the key Knowledge, principles and 
relationships in the text materials (Baker, Niemi, Novak, & Herl, in 
press). 

CRESST conducted a series of studies to determine how scoring 
rubrics should be developed, and the best strategy relied on looking at 
differences between expert and novice performance (Baker, Freeman, 
& Clayton, 1991). The essay scoring rubric consists of six dimensions, 
a General Impression of Content Quality scale (focused on the overall 
quality of the content understanding), and five analytic scales: 
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• Prior Knowledge (the facts, information, and events outside the 
provided texts \ised to elaborate positions); 

• Nxomber of Principles or Concepts (the number and depth of 
description of principles); 

• Argumentation (the quality of the argument, its logic and integra- 

tion of elements); 

• Text (the use of information from the text for elaboration); 

• Misconceptions (the number and scope of misunderstandings in 
interpretation of the text and historical period). 

Each of the above dimensions is scored on a 0-5 point scale. 

History experts and high school teachers have been involved 
throughout the study as co-designers, reviewers, and raters of the 
assessment. So far, six complete sets of history assessments have been 
developed: two on the Revolutionary period; one on the Civil War; two 
on 20th centtiry immigration; and one on the Depression Period. These 
tasks connect to the California History-Social Science Framework 
(1988). Replications in the areas of science (Baker, Niemi, Novak, & 
Herl, in press) and economics (Baker, 1991) have been conducted to 
assess the utility of the scoring rubric for explanation tasks in other 
content areas. 

What CRESST Has Learned 

Over the past several years of research on this project, CRESST 



has: 
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1. developed a valid scoring scheme for assessing deep imder- 
standing of history, generalizable across topics; 

2. developed rater training procedures that produce reHable and 
valid scoring of student tasks in a limited period. The scoring 
rubric makes strong cognitive demands of the raters; 
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3. built a task structure that reduces score variability so that fewer 
topics can be used to derive reliable scores for individual stu- 
dents. This technique is more efficient than foimd in most 
comparable studies. These relationships are all the more star- 
tling because of the lack of preparation and motivation among 
our students; 

4. distinguished between assessment purposes and the utility of 
overall score and subscores; 

5. foimd gender differences in this small sample, favoring females; 

6. foimd supportive data for the validity of our measures in grade 
point average (GPA) and a scale measiiring student effort; 

7 . systematically addressed validity criteria CLinn, Baker, & Dimbar , 
1991) in our research studies: the criteria addressed include 
fairness, generalizability, cognitive complexity, content quality, 
reliability, cost and efficiency. We are in the process of conduct- 
ing studies of transfer and designing research to assess the 
meaningfulness of tasks to students. 

For additional details on the background, development and 
methodology of this research, please contact the CRESST office. 
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How Much Do You Know About Chemistry? 

Dh-ections: This is a list of terms related to high school chemistry. In the space after each 
term, write down what comes to mind drawing upon your knowledge of chemistry. A brief 
definition would be acceptable, or a brief explanation of why that law, principle, concept, or 
procedure is important in explaining chemical phenomena. If a term is general, give both a 
general definition as it relates to chemistry and a specific example to show your understand- 
ing, if you can. 

Good Example: PERIODIC TABLE — An arrangement of chemical elements based on the 
order of their atomic numbers. Shows variation in most of their properties. Shows a natural 
division of elements into metals and nomnetals, inert gases, atomic weights. 

Do not define the term by simply restating the same words. 

Bad Example: ELECTRON LEVEL — The level of the electron. 

Even if you are not sure about your answer, but think you know something, feel free to guess. 

There are probably more items here than you will be able to answer in the time given. Start 
with the ones you know best, and work quickly so that you can answer as many as possible. 
Then go back and answer the ones of which you are less sure. Do not spend too much time on 
one specific item. 

1. density 
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2. solubility test 



3. conductivity 



4. chemical reaction 



5. base 



6. nucleus 



7. deductive reasoning 



8. conservation of energy 



9. precipitation 



10. fructose 



11. hypothesize 



12. empirical formula 




13- add 



14. experimental control 



15. gas laws 



16. compound 



17. ion 



18. indicator 



19- quantitative analysis 



20. hydration 




As an introduction to chemical analysis, a high school chemistry teacher performed an 
experiment for her class. This is a description of what she did. 

have two samples of soda,** she told the class. **One is regular soda containing sugar 
and the other is diet soda which contains an artificial sweetener. Fm going to identify each 
sample as diet or .agular by doing some chemical tests. As in any chemical testing, I won't 
allow my self to taste the samples but will base my decision solely on the chemical and physical 
properties of the two samples as determined by the tests.** 

She began by labeling the samples A and B to help her keep track of the sample she was 
testing. She then proceeded by saying, ""Since weVe been studying the properties of many 
different kinds of substances, we know that we often can identify an imknown substance by 
performing physical and chemical tests on the substance and observing reactions. For 
example, acids turn certain solutions pink, while alkalis turn them green, and neutral 
ingredients fail to change the color of the solution. Keeping in mind the chemical properties 
of sugar, Fm going to conduct the following tests: the yeast test, the benedict solution test, a 
test using sulfuric acid, a solubility test, a test using salt, and a residue test.** 

Her first test was the yeast test. She poured equal amounts of each soda into separate 
test tubes and labeled them A and B respectively. One soda reacted with the yeast to give off 
a distinctive odor as well as gas bubbles, and the other did not react in the same way. 

Next she used a benedict solution test. She began by pouring the indicator (benedict 
solution) into three test tubes. She then added a portion of soda A to one test tube and an equal 
portion of soda B to another test tube, making sure to note on each test tube which soda was 
added. The third test tube was a control: nothing was added to the indicator in this test tube. 
She waited, knowing that some substances take a while to react with the indicator. 
Comparing the two test tubes containing soda with the control, she pointed out that a reddish 
precipitate had formed in one of the test tubes. 

For her next test, she mixed sulfuric acid with each of the sodas, handling the add with 
extreme caution. She began by heating the sodas so that most of the Uqmd evaporated. Then 
as she added the sulfuric acid to each sample, she noticed that the acid reacted with one of the 
Bodas to form a gooey brown substance. 

To conduct the solubility test, she poured 100 ml of soda A and 100 ml of soda B into 
separate beakers and gradually added equal amoimts of sugar to each soda. She stirred the 
226 
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sodas and waited 15 seconds to see if the sugar dissolved. She found that more sxagar dissolved 
in one soda than the other. 

Next she prepared new samples containing equal amounts of each soda and added equal 
amounts of salt to each sample. She noticed that as salt was added, one soda fizzed more than 
the other. 

Finally, for the residue test, she placed 30 ml of each soda in separate test tubes, placed 
both tubes over a Bunsen burner and heated them xmtil 15 ml evaporated from each. She 
noticed that more residue was left in one of the test tubes. 

Upon cor dieting the various tests, she made a chart of the resvlts which looked like this: 



Yeast 
test 



distinct odor 
gas bubbles 



Benedict solution 

test reddish precipitate 



Sulfuric aci(| 
test 



Salt test 
sidue test 



no odor 
no bubbles 



no precipitate 



produced a gooey brown no gooey brown 
substance substance 



not much sugar 
dissolved 

not much fizzing 

a lot of residue 



a lot of sugar 
dissolved 

a lot of fizzing 

not much residue 



The teacher ended her demonstration by saying, *With your knowledge of the proper- 
ties of sugar and the results of the tests, you now can determine which of these sodas is the 
regular and which is the diet.** 



This task was adapted with permission from one developed and tested by the Connecticut State Department of 
Education. 
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11 things to all people'* could 
be a reasonable subtitle for 
performance assessment. 
This new type of achieve- 
ment measurement is promoted as re\v 
talizing teaching, reforming curriculum, 
and motivating students. Performance 
assessment is claimed to be useful for 
evaluating programs, impro\ang instruc- 
tion, comparing districts, and evaluat- 
ing university and job applicants. 
Tomorrow's news will probabiv report 
it lowers cholesterol as well. 

The hyperbolic expectations associ- 
ated with performance assessment have 
created a situation paradoxicalh* anafo- 
gous to previous testing practices now 
in the throes of mass repudiadon. Com- 
mercial test producers, you may recall, 
consistently reminded users that the pur- 
poses of their traditional (i.e., multiple- 
choice) measures were limited. It was 
the presumably school users who ex- 
panded the applications of these achieve- 
ment tests from reports of individual 
achievement to measures of account- 
abilir\'. When high stakes became asso- 
ciated with performance, so the stor\' 
goes, then instmction inappropriateiv 
focused on the rest. \'alidit\' of interpre- 
tation surtered, some believe standards 
were lowered, and in any event, these 
tests became regarded by many as an 
educational scourge. 

Now it appears that the educational 
community is on the verge of rcp^tijg^ 
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at least the synta.v if not the details of the 
same mJstake. Its vehicle may be portfo- 
lio assessment, considered one of the 
most appealing m.anifestations of pertbr- 
mance-based assessment. Ponfolios of 
student accomplishment allow the col- 
lection of a cumulative record of a 
student's growth. Following the meta- 
phor from the visual ans, a portfolio can 
include a selection of the student's prized 
crtbrts, a display of vinuosit\', and even 
progression through developmental 
stages, her own Blue Period, for ex- 
ample. 

Portfolios conceived as intensely per- 
sonal portraits of accomplish ment would 
seem to have a number of desirable 
consequences. Students would have to 
become active in the process and choose 

(continued on paffc 8) 
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"^Jusc coUecdng children's work does not accomplish die tasks 
of setting standards and measuring learning."" 

Excerpt from the new CRESST videotape 
Tonfolio Assessment and Hi^h Technology'' 



ortfolioSy as effective assessfnent, 
should include clearly de fried 

student standards designed in a 

collaboration process that includes class- 
room teachers 

This statement is one of the key 
messages in "Portfolios and High Tech- 
nolog}',"' a recendy released videotape 
produced by UCLA's National Center 
for Research on Evaluation, Standards, 
and Student Testing (CRESST), with 
suppon from Apple Classrooms of To- 
morrow ( ACOT)sm. Join the CRESST 
research staff, including Eva Baker, in 
this 10-minute videotape as they ex- 
amine the important issues and uses of 
ponfolio assessment including: 

• Theoretical needs for im- 
proved assessment; 

• Student use of portfolios in 
the classroom; 

• Improv ed teacher motivation 
through the portfolio pro- 
cess; 

• Teacher workshops that help 
to define and select stan- 
dards; 

• Involvement of parents in 
the portfolio process; and 

• Use of technology- to pro- 
mote good writing. 



Demonstrating effective use of 
classrom technology', the tape shows stu- 
dents engaged in a v*arict\- of activitites 
including: 

• Selecting best pieces of class- 
room work; and 

• Development of computer- 
based portfolios. 

Usetlil for school districts, principals, 
and teachers interested in starting or 
improving their own portfolio programs, 
this cape will also interest researchers 
who want more information about the 
latest CRESST research into portfolio 
assessment. Although this videotape em- 
phasizes technology, the content is ap- 
plicable to nearly all portfolio programs. 

Cost of the tape is SI 0.00 and it may 
be ordered on page 1 1. 




For those inUrested in portfolios, thffol- 
lowing technical report will be of special 
value. 

Writing Portfolios at the 
Elementary Level: 
A Study of Methods far 
Writing Assessment 

Maryl Gcarhart, Joan Herman, 
Eva Baker, & Andrea Whittakcr 

CSE Technical Report 337, 1992 
($4,00) 
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I his study examines the teasiuil- 
ity of using student portfolios 
J to evaluate wridng competence. 
The authors found that analytic portfo- 
lio ratings showed promising levels of 
measurement qualit\-, but differences in 
assessed level of performance emerged 
^\•hen portfolio scores were compared to 
other assessments. Qualitative analyses 
ot the scoring process revealed signifi- 
cant design challenges, particularlv in 
devising portfolios that reflect classroom 
instruction yet are sufficientiy uniform 
to permit meaningtui comparisons within 
and between classrooms and schools. 

Please use the order form on page 1 1 or 
call Kim Hurst at CRESST (310) 206- 
1532, for a complete list of CRESST 
technical reports. 
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Portfolios As Worthwhile Bt^rdens? 



I he "bad'' news: Portfolios are 
indeed a major time and re- 
I source burden on schools, es- 
pecially on teachers. 



T 



The '^good" news: The instructional 
and motivational results from portfolios 
may lead to important changes in class- 
room practices. 

Interim results from uvo research stud- 
ies provide evidence to support such 
conclusions, at least for lar^e-scale port- 
folio systems. In a new report. The Ver- 
mont Poytfolio Assessment Program: In- 
terim Report on Impleme^itatkn and 
Impact, 1991-92 School Tear, teachers 
and principals reported that implement- 
ing the portfolio program quired con- 
siderable time and effort. Even so, many 
of them felt that positive classroom ef- 
fects were the result. 

Authors of the CRESST/Rc\ND re- 
port, Dan Koretz, Brian Stecher, and 
Edward Dcibert, have been evaluating 
the Vermont portfolio program for al- 
most uvo years. Vermont was the first 
state to make portfolios the backbone of 
a statewide assessment system. 

The researchers say that support for 
the Vermont portfolio program, despite 
tremendous demands on teacher time, is 
widespread. "Perhaps the most telling 
sign of support for the Vermont portfo- 
lio program,'' wrote the authors, "is that 
[even in the pilot year] the portfolio 
program had already been extended be- 
yond the grades targeted by the state/' 

In a second statewide portfolio assess- 
ment project, the Michigan Employabil - 
ity Skills Portfolio, teachers and school 
officials have also reported increased 
demands on their rime. As noted by 
Michigan educators, one of the kev is- 
. 232 
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sues for portfolios is their overall cost 
factor — expenses such as duplication 
costs, storage space, dme for training 
teachers, and rime for scoring portfolios. 
Despite the cost issue, teachers in Michi- 
gan expressed enthusiasm about portfo- 
lios. 

^IVhen students ask me why 
do I need to learn this^ I have 
a real answer now. . . ^ 

"When students ask me, as they had in 
the past, why do I need to leam this, I 
have a real answer now," said Rita Kirby, 
a teacher from Ithaca, Michigan. "I tell 
them that the kinds of skills that thev are 
developing through the portfolio [pro- 
gram] are skills that are going to be 
ser\ing them for the rest of their lives." 

Background- Vermont 

Unlike most other states, Vermont 
had no statewide educadonal tesdn^ pro- 
gram unril a portfolio system was se- 
lected in 1988. Since then, Vermont has 
been developing a "cutdng edge" as- 
sessment program, the centerpiece of 
which are portfolios of students' work 
and "best pieces" drawn from them. The 
Vermont assessment program currendy 
includes mathemadcs and wriring port- 
folios in grades 4 and 8 and will eventu- 
ally encompass a broader range of sub- 
ject areas. 

Vermont's use of the portfolio results 
vvill be limited compared to some state 
and national proposals for accountabil- 
it\'. Although schools and principals mav 
use the results for assessing individual 
student skills, the state will use the re- 
sults only as a barometer of school and 
district movement towards state goals of 
instructional reform. In mathematics. 



for example, Vermont wants students 
to increase their problem -solving skills, 
understanding of patterns and rela- 
tionships found in mathematics^ and 
communication of mathematical con- 
cepts. Early results from the Vermont 
research indicate that teachers, in re- 
sponse to the Vermont assessment 
program, are indeed spending signifi- 
candy more instructional time on these 
specific areas. 

Background-Michigan 

In the 1980s, xMichigan's economy 
was suffering from a serious loss of 
manufecturing jobs. To improve the 
employabilitN' skills of its workforce, 
the [Michigan] Governor's Commis- 
sion on Jobs and Economic Develop- 
ment convened the Employabilitv 
Skills Task Force in 1987. The task 
force's mission was to identify "skills" 
essential to new employees entering 
the workforce. 

After a comprehensive sun-ey of 
over 2000 businesses, the task force 
published the Employabilitv^ Skills Pro- 
tile, expanding on three previously 
identified areas of needed student 
achievement: academic, personal man- 
agement, and teamwork skills. Port- 
folios, at that time already growing in 
popularitx' in some Michigan schools, 
were seen as a method of assessing 
and encouraging students to enhance 
their workforce skills. 

Subsequently , a 1 99 1 -92 Michigan 
school aid act mandated that all school 
districts develop and maintain a port- 
folio for ever\' student in grades 8-12 
during the 1992-93 school year. For 
the 1993-94 school year, the portfo- 
lio must be implemented for all 9th 
(continued on paefe 4) 
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graders and will be extended to 8th 
graders in 1994-95. 

Piloted in 1990-91 by the Michi- 
gan Department of Education, the 
Employabilit}' Skills Portfolio is now 
in the first year of full implementa- 
tion. More than 300 school districts 
have agreed to use the portfolio pro- 
cess to help students focus on em- 
ployability skills. 

Bottom-Up 



Similar to the United Kingdom's 
"Records of Achievement/' (see re- 
lated article on page 6), the Vermont 
and Michigan portfolio programs are 
bottom-up approaches to assessment 
reform. Teachers' support is elicited 
through their inclusion in the devel- 
opment and implementation process. 
The states provide technical advice 
and some funding to assist school 
districts with portfolio development, 
but pressure as to what those portfo- 
lios must include is avoided. 

Based on results in Vermont and 
Michigan and similar etibrts in the 
United Kingdom, the bottom -up 
approach has been effective in gener- 
ating ground support from teachers, 
principals, and school districts. 

Resource Constraints 

Despite Michigan and Vermont ef- 
forts to provide training to schools 
and districts, many time and resource 
constraints have been reported. 

Vermont teachers felt that the great- 
est problem created by the portfolios 
"is not about what to do, but when to 
do it.'' The researchers found that 
over 80% of fourth-grade teachers 
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and over 60% of eighth-grade teachers 
often had difficulty covering the required 
curriculum. And 60% of both groups 
reported that they often lacked the time 
to prepare portfolio lessons, 

Teachersin Vermont also wanted more 
guidance, Sevent\'-five percent of the 
fourth-grade teachers and two-thirds of 
the eighth-grade teachers felt they lacked 
adequate training at least occasionally. 
Even teachers who had taken part in the 
previous pilot program reported similar 
needs, 

^^Ratin^ diverse portfolio 
entries is problematic. . . 

In Michigan, CRESST is exploring 
efficient ways to describe portfolios with- 
out making an absolute judgement on 
portfolio contents. A portfolio descrip- 
tive system is under development and is 
based on the Michigan employability 
skills survey. 

"Rating diverse portfolio entries is 
problematic,'' said Eva Baker and 
Jonathan Troper, the two CRESST re- 
searchers assisting the Michigan Depart- 
ment of Education. Referring to the 
difficult}' of rating portfolios. Baker used 
a metaphor of student classroom perfor- 
mance and athletic achievement. 

"Is good team performance," asked 
Baker, "exemplified by being the most 
valuable player on a team, being a player 
on lots of difterent sport teams, or bv 
getting an etftisive letter fi-om the coach? 

Baker and Troper are helping xVIichi- 
gan to analyze whether descriptive 'mior- 
mation inside a portfolio can lead lo a 
better understanding of student perfor- 
mance. 

248 



Effects on Instruction 

If the portfolio resource, time burden, 
and rating process on teachers and schools 
is so great, what makes portfolios any 
good? Instructional eflfectsfor one thing, 

CRESST/RAND researchers found 
that the majoritv* of Vermont educators 
believed the assessment program had 
already had substantial positive effects 
on instruction, Sixt\' percent of the sur- 
veyed principals felt that the portfolio 
program had a positive effect on instruc- 
tion although 25% felt that it was too 
early to tell. Despite this latter finding, 
principals seemed to agree thar: "Portfo- 
lios are a worthwhile burden.^ Accord- 
ing to the report: 

One relatively frequent comment ( 1 6% 
of principals) was that teachers increased 
their emphasis on problem soKing and 
''flexible'' thinking. Other principals 
mentioned specific changes in instruc- 
tional methods or st\ies, including a 
lessened reliance on textbooks, less em- 
phasis on drill and practice, an increased 
reliance on hands-on learning, increased 
use of interdisciplinar)' projects, and in- 
creased emphasis on communication of 
mathematics. 

Teachers also supported positive in- 
structional effects of the Vermont port- 
folio program: More than one-half of 
the surveyed teachers said they were 
frequently more enthusiastic about teach- 
ing math, and over 90% were more en- 
thusiastic at least occasionally. Over 40% 
( of teachers ) reported the following posi- 
tive efrects: the goals of mathematics 
instruction arc improved; math is more 
closely linked to other subjects; stu- 
dents' attitudes towards math improve; 
and students are learning more math. 

(continued outage 5) 
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Another interesting instructional phe- 
nomenon was that over 80% of the sur- 
veyed teachers in the Vermont studv 
indicated that they had changed their 
opinion of students' mathematical abili- 
ties based upon their students' pontblio 
work. In many cases, teachers noted that 
students did not pertbrm as well on the 
portfolio tasks 'as on previous classroom 
work. This finding, supported by other 
performance assessment research, sug- 
gests that ponfolios may give teachers 
another assessment tool that appears to 
broaden their understanding of student 
achievement. 

Michigan teachers aJso reported posi- 
ti\e effects on their students: 

"We found that portfolios have a lot of 
educational benefits for students that 
aren't related to the assessment," said 
Catherine Smith from the Michigan 
Department of Education. "We're find- 
ing that one thing students begin to 
recognize is that their accomplishments 
outside of school are really important, a 
hobby or a club they belong to, an 
activit\' with church, or taking care of 
siblings. These activities have meaning 
for preparing them for life." 

In Conclusion 

Indeed, the bad news is that portfolios 
arc definitely burdens in terms of teacher 
time and resources. That fact is unlikelv 
to change. But //portfolios arc ad- 
equately fijnded and lead to significant 
improvements in teacher motivation, in- 
structional processes, self-evaluation, 
deeper understanding of content, and 
improved skills leading to emplovment, 
then the price may be worth it. 

{See pages 10-11 for information on 
ordering CSE Technical Report 350, 
The Vermont Portfolio Assessment Pro- 
gram: Interim Repon on Implementa- 
tion and Impact, 1991-92 School Year, 
Koretz, Stccher, and Deibert. The cost is 
S4.00.) 
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A Practical Guide to 
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here is no one right way to 
assess students," say authors 
Joan Herman, Pamela As- 
chbacher, and LvTin Winters in their 
recently published book, A Practical 
Guide to Alternative Assessment. But the 
authors suggest that many alternative 
assessments offer very appealing wavs to 
assess complex thinking and problem- 
solving skills. And because these new 
types of "tests" are grounded in realistic 
problems, they are potentialh' more 
motivating and reinforcing tor students 
than traditional assessments. 

Published by the Association for Su- 
pervision and Curriculum Development 
( ASCD), A Practical Guide to Altcrna- 
tive Assessment is written for preservice 
and practicing teachers, sc ool adminis- 
trators, and district and state level prac- 
titioners who are interested in creating 
their own alternative assessments, or in 
understanding the issues and impro\'ed 
methods for assessing student knowl- 
edge. 

Within the book's 121 pages, the au- 
thors present a topical guide to alterna- 
tive assessments including chapters on: 

• Rethinking assessment; 

• Linking assessment with in- 
struction; 

• Determining purpose: 

• Selecting assessment tasks; 

• Setting criteria; 

• Ensuring reliable scoring; 

• Using alternative assessment 
for decision making, 

249 




The authors discuss the develop- 
ment of altemadve assessments within 
the context of a unique process model 
that links curriculum, learning, and 
instruction. 

'The authors have reaffirmed the 
fijndamental role of assessments," 
concludes ASC D President Stephanie 
Pace Marshall, "which is to pro\ide 
authentic and meaningfiil feedback 
for improving student learning, in- 
structional practice, and educational 
options," 

Available through ASCD bv calling 
(703) 549-91 10, A Practical Guide 
to Alternative Assessjnentcosts S 1 0.95. 
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Records of Achievement 
Lessons from the United Kingdom 

ProfiUng [Records of Achievement] thus arguably represents a new disdpUnarv technique which... has the poten- 
tial to exercise more effective control than any assessment procedure yet devised. 

Patricia Broadfoot 
University of Bristol, U.K. (1990) 




nternational assessment re- 
searcher Patricia Broadfoot's 
preceding statement is indica- 
tive of the high hopes held in many 
countries for portfolios, or as called in 
the United Kingdom, records of 
achievement (ROAs). In 1984, the 
United Kingdom , considered by manv 
as the world leader in the develop- 
ment of performance assessments, 
mandated that records of achievement 
would be used in all secondary class- 
rooms by 1990. But today, almost 
nine years later, despite extensive re- 
search and development, many U.K. 
schools are still struggling with ROA 
implementation issues and the U.K. 
government has backed away from its 
original 1984 requirement. 

Are states and major school districts 
in the United States, many tning to 
create large-scale portfolio systems 
similar to the United Kingdom's 
ROAs, headed down a similar path? 

A brief comparison between the 
U.K. and state and local portfolio 
systems indicates that, at a minimum, 
the U.S. already faces many of the 
same portfolio implementation prob- 
lems as have confronted our friends 
from across the pond. 

Commonalities 



ROAs and large-scale portfolio svs- 
tems in the United States are not 
identical. ROAs tend to have a more 
standardized format and cover achieve- 
ment across and beyond the whole 
curriculum. Nevertheless, ROAs and 
large-scale U.S. portfolios do share 
I many commonalities. Consider a 
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Records of Achie\^ment National Steer- 
ing Committee (1989) that listed the 
purposes of the ROAs : 

• to contribute to the raising of all 
pupils' achievement through 
and beyond the national curric- 
ulum; 

• to improve [students'] motiva- 
ti. n; 

• to prepare [students] for the 
transition to further education, 
training and employment; and 

• to help schools to consider how 
well their curriculum, teaching 
and organization enable pupils 
to develop their all around po- 
tential. 

A comparison of these purposes to the 
large-scale Michigan portfolio effort ( see 
page 3 article ) indicates similar purposes. 
For example, the Michigan employabil- 
it\- goals call for students to develop: 

• a new and higher order of aca- 
demic skills; 

• Personal Management Skills that 
allow them [students) to devel- 
op and demonstrate the atti- 
tudes, abilities, behanors and 
decision-making processes as- 
sociated with resr)onsibilit\* and 
depcndabilit)*; 

- teamwork skills that enable tiiem 
[ students ) to fiinction effectively 
as members of multiple work 
teams and contribute to groups 
in accomplishing work tasks. 



Both U.K. and Michigan goals em- 
phasize improved student academic skills, 
motivation and attitude, and teamwork. 
Other similarities between records of 
achievement and many large-scale U.S. 
portfolio projects include the following 
concepts. Portfolios and ROAs: 

• are valued documents meant to 
provide more and broader in- 
formation to parents than tradi- 
tional report cards; 

• are owned by the pupil; 

• often have as their end goal im- 
proved employment opportu- 
nities; 

• feature a bottom-up design; and 

• are usually implemented with 
minimal funding increases from 
mandating organizations. 

If there is a major difference benveen 
ROAs and current large-scale portfolio 
systems here, it might be that the U.K. 
uses the ROAs as extended resumes for 
entrance into the workforce. But the 
Michigan Employabilit\' Portfolios could 
make even that difference less significant 
in the future, as could the demand by 
employers across the countr\' to improve 
e\idence of student workforce skills and 
for changing the way children are edu- 
cated. 

The problematic R&D issues that con- 
front portfolio proponents m both coun- 
tries are similarly overiapped. Issues of 
equitv', issues of knowing if portfolios or 

(continued 07^lg^e 7) 
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Records of Achievement 
Lessons from the United Kingdom (fro 



ROAs really make a difference in student 
learning, and issues of what makes a 
good portfolio have challenged both 
countries. Perhaps the greatest problem 
yet to be resolved by either the U.K. or 
any U.S. state is that of inadequate re- 
sources, especially teacher time. For ex- 
ample, the 1989 Records of Achieve- 
ment National Steering Committee 
(1989) stated the time problem suc- 
cinctly: 

It is nevertheless clear that the 
overall volume of teacher time 
is the main call [demand] that 
records of achievement systems 
make on schools' resources. In 
particular, tutorial time needs 
to be found for pupil/teacher 
discussions and for the prepara- 
tion of summary' documents. It 
will be essential for time to be 
found for in-ser\'ice training for 
records of achie\-ement which is 
caretiiUy integrated with train- 
ing needs arising from other 
related activities. 

Both Vermont and Michigan have rec- 
ognized the Same major resource prob- 
lems in their state- wide porttblio pro- 
grams. 

What's Happening Now 

The 1984 United Kingdom govern- 
ment polic\* statement and the 1989 
Records of Achievement National Steer- 
ing Committee established a deadline of 
1990 to introduce the records of achieve- 
ment into all U.K. secondary schools. 
But according to Desmond NuttalL one 
of the members of the 1989 Steering 
Committee and a leader of the British 
assessment reform movement, the gov- 
ernment has backed away from its 1984 
mandate. Overwhelmed by its recendy 
established national curriculum and as- 
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sessments, the government decided to 
continue its policy of voluntary records 
of achievement. Nevertheless, approxi- 
mately 80% of the secondary schools 
have implemented the ROAs into their 
classrooms, and many primary and middle 
schools have followed suit. 

Nuttall also repons another interest- 
ing development. The primary pressure 
for the ROAs has come most recendy 
from the U.K. Department of Employ- 
ment and not their Department of Edu- 
cation, which had previously promoted 
the ROAs. The Department of Employ- 
ment is encouraging students to use the 
ROAs during job interviews. But Nuttall 
adds that the original plan, for all stu- 
dents to use the ROAs for employment, 
has not happened. 

Scoring of the ROAs, or lack thereof, 
has not become a major issue in the 
U.K., as it is becoming for some large- 
scale assessments in the United States. 
Nuttall attributes this fact to their cur- 
rent national assessment system which 
does not consider ROAs to be a major 
test instrument. 

Lessons 

If there is a lesson for the United States 
in the U.K. experience, it might be that 
the implementation of ponfolios is not 
an overnight process. Having mandated 
records of achie\'ement almost nine years 
ago, the British are still struggling with 
the same rvpes of questions faced today 
by state and local agencies in the United 
States: questions of policy, costs, train- 
ing, scoring, and use by t.mployers, 
teachers, and schools. 

.\nd there are many potential pitfalls 
that have he<*n discussed but not tblly 

(continued on page S) 




What Is A 
Record of Achievement? 



T 



enhance enq^o|^ xtcog- 
Mtion of the Records of 
^ftchicvement ancf tp^ kaa- 
dardize the ROA basic design and 
contents, the United Kingdom de- 
veloped a ^National B^ord of 
Achievement*' notebooks Each 
student's notebook has a nadonaily 
recognizable, standardized cover, an 
inidal sheet describing pctsonal de- 
tails, and four main pages of records 
including: 

• a summary of school 
achievements in the curric- 
ulum; 

• a sunmiary of qualifications 
and credits; 

• a simimar}' of other achieve- 
ments and experiences; and 

• a personal statement. 

The Record also contains a sheet for 
students to record their employment 
history. 

Compact in design and incapable of 
holding massive amounts of student 
work, the ROA focuses on e\ddence 
of student achievement including: 
examination results, qualifications, 
achievements through work experi- 
ence, non-academic interests, future 
plans, and needs of the student. The 
U,K places emphasis on students to 
eliminate old material. 
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Records of Achievement,.. 

(from pa^e 7) 



From the Directors. 



(from page 1 ) 




researched. For exampie, if portfolios 
are ever to be used for such high stakes 
decisions (Brandt, 1992) as college 
entrance, they will undoubtedly have 
to meet the same t\pes of stringent 
validity and reliability measures as cur- 
rent standardized assessments. Even 
the most ardent proponent of porttb- 
iios would have to agree that portfolios 
are a long way off from that level of 
rigor. Ultimately, the portfolio road 
may be a good one. but there remain 
many important issues to be resolved. 

Our vny special thanks to Professor 
Desmond Xuttall, Institute of Educa- 
tion, University/ of London, for provid- 
in£i CRESSTmth valuable Records of 
Achievcmenr^ information. 
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tasks rather than simply react to the tasks 
assigned by others. Teachers could help 
students accomplish significant tasks, 
worthy of time, reflection, and refine- 
ment. Portfolios and their evolving con- 
tents could be a source of pride for 
students and for their parents. As one of 
our relatives says, '^What could be bad.-" 

Multiple Purpose s 

Although starting from the intimaa' 
of the individual teaching/learning situ- 
ation, portfolio use is rapidly transmut- 
ing to a multitude of purposes. Effi- 
ciency principles suggest we should sup- 
pon multiple goals met with single in- 
terventions. 

But clearly not all uses will occur 
withoutside effects. For instance, should 
portfolios be used, as proposed in Michi- 
gan, as display's to assist employers to 
make a hiring decision? Why not, if thev 
are promoted as a form of elaborated 
resume? Again, the process mirrors the 
teacher-student relationship, with rela- 
tively idiosyncratic standards applied for 
particular job options. The portfolio, 
along with other indicators, allows the 
employer to make a choice. The caveat is 
that children in different schools need to 
have the same level of assistance in the 
development of their ponfblio. 

Could ponfolios provide an exhibit 
for the public and policy makers of the 
t\pe of curricular emphases occurring in 
local schools and classrooms? Cenainlv 
a sample for renew at critical grades 
could be made available as exhibits in 
displays for parents or for school board 
members. 

But some ponfblio enthusiasts have 
bigger aspirations. They seek to have 
ponfolios used for student and school 
comparisons for accountabilit\'purposes. 
Ponfolios would be graded to give stu- 
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dents individual scores, to judge svs- 
tems' progress toward achieving stan- 
dards, and to evaluate programs. In 
order to accomplish these worthy ends, 
portfolios need to be assessed according 
to common standards to ensure fairness. 
It is the scoring of portfolios, and the 
concomitant stakes assigned to them, 
that triggers our concern. 

How will ponfolios be judged? If they 
are rated by explicit guidelines pro\ided 
for their judgment, obvious conse- 
quences may occur. One is that the 
surface features of the scoring system 
will drive the portfolio development to- 
ward more superficial, and incidentally, 
homogeneous performance. Indindual 
reflection and choice could be given a 
back seat to making sure tiiat particular 
features are included, that is, buttons 
pushed, to get a ''high'' score. Students 
with sav\y parents and teachers will surelv 
do well. 

Issues of Scoring 

One alternative procedure is to use 
global scores, a 4 for excellent, a 3 for 
competent, and so on, instead of explicit 
scoring rubrics. Such summan- scores, of 
course, operate against the formative, 
interactive strategy- that ponfblio assess- 
ment is supposed to promote. When 
global ratings are given with a lack of 
models or explicit criteria, what is likelv 
to be detected are gross differences in 
individual talent and experience. Such 
scores would not help teachers to im- 
prove teaching and learning. They would 
function like a qualitative stanine. 

Despite the frequent Olympic games 
allusions to judgment of qualitative per- 
formance, ponfblios are in a different 
arena. In ice skating, for instance, even in 
the most crcati\ e e\'ents, there are known 
expectations, for example, how many 

(continued o^Qg^e 9) 
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From the Directors 

(from paje 8) 

jumps of one or another t\pe and diffi- 
culty must be made. In portfolios, the 
performance land of video collage, po- 
etry, spreadsheets, and civic projects, we 
have no agreed upon components and 
few standards for any required pieces. 

Do portfolios have a place in account- 
ability? Maybe they do. On a sampling 
basis, portfolios can perhaps give good, 
qualitative information about what is 
happening in the best and in more typi- 
cal classrooms. Do portfolios need to be 
scored, with all the attendant issues of 
reliability, valid rubrics, cost and time? 
Maybe; but maybe not. In a project in 
the state of Michigan, CRESST staff, the 
Michigan Department of Education, 
teachers, and administrators are trying 
to see if portfolios can be reviewed de- 
scriptively rather than scored. Our pre- 
liminary' work suggests, for accountabil- 
ity purposes, this approach might pro- 
vide sufficient information and at a re- 
duced cost. 

Appropriate Uses 

Let's remember portfolios tundamen- 
taliy are intended to provide qualitative 
information on a rich, diverse, unpre- 
dictable, and most importantly, indi- 
vidual set of performances . Let's not lose 
these key goals by converting portfolios 
mindlessly to inappropriate sources of 
quantitative information — at least not 
without monitoring the effects of those 
actions on teaching and learning. Some 
things are good for only one (or a few) 
purposes. 



CSE/CRESST Reports 





I he foUowing reports have recently been released and are available through 
the CSE/CRESST office. To order any report, fill out the order form on 
I page U , or for a complete Hsting of all CSE/CRESST technical reports, 
monographs and resource papers, please contact Kim Hurst at (310) 206-1532.' 

CRESST Performance Assessment Models: 
Assessing Content Area Explanations 

Eva Baker, Pamela Aschbacher, David Niemi, and Edynn Sato, 1992, (S10,00) 

Over 500 copies of this handbook have been distributed since its publication 
m April, 1992; consequently, we are once again promoting its use bv anyone 
mterested in developing performance assessments. Presenting a perform^ce- 
based approach to assessing students' understanding of subject matter content 
the handbook includes: ' 

• a concise model of alternative assessment for those who need to de- 
velop similar assessments of their own; , 

• examples of successfiil CRESST assessment materials; 

• an effective scoring rubric for performance assessments applicable to a 
varietx' of topics; 

• usetiil benchmark papers. 

The assessment model, based on a highly contextualizcd histon^ performance 
task, requires students to engage in a sequence of assessed steps, beginning with 
an minal assessment of their relevant background knowlede:e of the particular 
historical period. Next students are provided with opposing \ieupoints in 
pnmar>' source materials, npically letters or speeches of historical figures. Finallv, 
students are asked to write an extended essay that explains the positions of the 
authors ot the texts and to draw upon their own background knowledge for 
explanation. 

The essay scoring rubric consists of six dimensions: a General Impression of 
Content Qualit>- scale, and five anahtic subscales. Histon' experts and high 
school teachers have been involved throughout the studv as co-designers, 
re\newcrs, and raters of the assessment and have provided valuable input into the 
assessment. 



(continued on page 10) 
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More CSE /CRESST Technical Reports 



(continued frontpage 9) 

Also included in the handbook are: 
background information on the 
CEJESST performance-based assess- 
ment, examples of assessments for 
secondan'-level history and chemis- 
try, and specifications for duplicating 
our technique with other topics and 
subject matter areas. Our rater train- 
ing process, scoring techniques, and 
methods for reporting results are de- 
scribed in detail. 

CRESST believes ±at this assess- 
ment will be useful for both large- 
scale applications and instructional 
improvement. Having used this as- 
sessment approach to research a num- 
ber of technical issues in performance 
assessment, CRESST has evidence of 
the reliability, validity, and generaliz- 
ability of its technique. 

The Vermont Portfolio Assessment 
Program: Interim Report on Imple- 
mentation and Impact, 1991-92 
School Year 

Daniel Koretz, Brian Steche7\ Ed- 
ward Deibert 

CSE Technical Report 350, 1992 
(56.00) 

See page 3 for a complete article on 
this repon. 

Design Characteristics of Science 
Performance Assessments 

Robert Glaser, Kalyani Raghava7i, 
&' Gail Baxter 

CSE Technical Repon 349, 1992 
($3.00) 

Part of a long range goal to investi- 
gate the validity of reasoning and 
problem-solving assessment tasks in 
science, this report describes progress 
in analyzing several science perfor- 
mance assessment projects. The au- 
thors discuss developments from 
Connecticut's Common Core of 



Learning Assessment Project, the Cali- 
fornia Assessment Program, and the Uni- 
versity of California, Santa Barbara/Cali- 
fornia Institute of Technology research 
project "Alternative Technologies for 
Assessing Science Understanding." The 
analysis framework articulates general 
aspectsofproblem-solving performance, 
including structui-ed, integrated knowl- 
edge; effective problem representation; 
proceduralized knowledge; automatic- 
ity; and self-regulatory skills. 

Accountability and Alternative As- 
sessment 
Joan Herman 

CSE Technical Report 348, 1992 
($4.00) 

Despite growing dissatisfaction with 
traditional multiple-choice tests, national 
and state educational policies reflect con- 
tinuing belief in the power of good 
assessment to encourage school improve- 
ment. The underlying logic is strong. 
Good assessment sets meaningtiil stan- 
dards, and these standards proxide di- 
rection for instructional efforts and mod- 
els of good practice. But are these rea- 
sonable assumptions? How close are we 
to having the good assessments that are 
required? 

This report summarizes the research 
endence supporting current beliefs in 
testing, identifies critical qualities that 
good assessment should exemplify', and 
reviews the current state of the research 
knowledge on how to produce such 
measures. 

The Influence of Problem Context 
on Mathematics Performance 

Noreen Webb Esther Tasui 

CSE Technical Report 346, 1992 

(S4.00) 

Mathematics educators and research- 
ers argue that using realistic, complex 
problcm.-solving instruction and assess- 



ment can improve students' problem - 
solving skills and attitudes towards math- 
ematics. The objectives of this study 
were: (a) to determine whether working 
with more realistic and lengthier prob- 
lems during instruction will make stu- 
dents better able to solve similar prob- 
lems on an achievement test, and (b) to 
determine whether the different kinds of 
problems (short vs. extended word prob- 
lems) will pro\ide different information 
about students' performance and math- 
ematical problem-solving ability. The 
comparisons suggest that there arc im- 
portant aspects of students' ability to 
solve structured problems that can be 
measured with extended, complex, real- 
istic problems. 

Measurement of Workforce Readi- 
ness: Review of Theoretical Frame- 
works 

Harold F. O'NeiL Jr,. Keith Allred, &- 
Eva L. Baker 

CSE Technical Report 343, 1992 
(S4.00) 

The cn' of American management for 
\\-orkers with greater skills has spawned 
many commissions, task forces and stud- 
ies, including five studies reviewed in 
this report: 

• What Work Requires of 
Schools ( Secretar}-'5 Commis- 
sion on Achieving Necessan* 
Skills); 

• Workplace Basics: The Essen- 
tial Skills Employers Want 
(American Societ\' for Train- 
ing and Development); 

• Michigan Employabilitv- Skills 
Employer Survey; 

• Basic and Expanded Basic 
Skills (New York State Educa- 
tion Department); and 

• High Schools and the Chang- 
ing Workplace: The Employ- 
ers* View (National Academy 
of Sciences). 
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Special CRESST Report fr om Robert L Linn 

Educational Assessment: Expanded Expectations and Challenges 
( 1992 Thomdike Award Address ) 
Robert Linn 

CSE Technical Report 351, 1992 (S3.50) 

I f ^^"7"^' P°''^-"^f are keenly interested in educational assessment," says Robert L. Linn in his 1992 Thomdike 
'Award address to the Amencan Psychological Association. Linn points to the various attractions that assessment 
have tor poh^-makers who frequently think of assessment as a ^kind of impartial barometer of educSfonTq" 
But assessments are frequently used for t^vo questionable purposes, im^lies Linn, first, to point out the declinine quaJin- 
of Amencan education and, secondly, as an instrument of educational reform. "Such greadv expanded a^TsometimS 
unrealisnc, polu^maker expectations," he says, "together u-ith the current press for radical chTges i^^he nrtuTof 
assessments, represent mapr challenges for educational measurement." Linn c includes his remS b sa\te Ta d.e 
measurement research communit>- must make sure that the consequences fo. anv new high-stakes vcTrL^cclsc^r^nz 
system are better investigated than they were for previous assessment reforms ' s penormance assessment 
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Linda Winfield Joins CRESST Staff 



e 



RESST is delighted to have 
Professor Linda F, Winfield 
I join its research staff. Dr. 
Winfield, formerly a principal research 
scientist at the Center for Research on 
Effective Schooling for Disadvantaged 
Students at Johns Hopkins Universin-, 
is currently a nsiting professor at UCLA's 
Graduate School of Education. While at 
Johns Hopkins, she was also co-director 
for "-Special Strategies for Educating 
Disadvantaged Students," a congres- 
sionally-mandated, national study of 
''exemplar)''' urban schools. 

Dr. Winfield's published work in- 
cludes numerous articles focused on re- 
search and policies in urban education, 
including Chapter I evaluation, imple- 
mentation and change in schoolwide 
projects, assessment of students from 
diverse populations, and equit\-. She has 
recci\'ed support from the Rockefeller 
Foundation and the National Science 
Foundation for her work on literacy 
proficiency among black voung adults. 



Center for Research on Evaluation. 
Standards, and Student Testing 
Eva L. Baker, Co -director 
Robert L. Liiin, Co-director 
Joan L. Herman, .Associate Director 
Ronald DictcK CRESST Lin:: Editor 

The work reported herein was supported under 
tlK Educational Research and De\ciopment Cen- 
ter Proixram cooperative agreement number 
R117G10027 and C¥D.\ catalog number 
tS4.1 17G as administered bv the OtKce of Hdu- 
cational Research and Improvement. L*.S. De- 
partment of Education. The findings and opin- 
ions expressed m this publication do not reflect 
the position or policies of the Office of Educa- 
tional Research and Improvement or the U S. 
Department of Education. 

To be placed on the CRESST Liiie 
mailing list please write to CRESST 
Line, UCL'\ Graduate School of Edu- 
cation, 405 Hilgard Ave., Los Aneeles. 
CA 90024-1522. 



Dr. Winfield is currently teaching the 
Introduction to Educational Evaluation 
course at UCLA's Graduate School of 
Education. She is collaborating with the 
CSE evaluation of a New American 
Schools Development Corporation 
project, '"The Los ^^jigeles Learning 
Centers,'' and will be mvolved in several 
CRESST projects ir.voK'ing equir\' and 
validit}' issues of performance assess- 
ments. 




Linda Winfield 



Rebuild L.A. 




CR£SST/UCL\ contributed over S1700 to the Rebuild L.A. 
effort. L-R. CRESST Project Director Josie Bain. Rev. c:ecil Murray 
and Rev. Carmen Speights from the First African Methodist Episco- 
pal Church, and CRESST Director ot Communications Ron Dietel 
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Construction Versus Choice in Cognitive Measurement: 

Issues in Constructed Response^ Performance Testings 
and Portfolio Assessment 



Table of Contents and Preface 



This book is based on the major presentations of a conference held at Educational 
Testing Service in November 1990. The first chapter explores the meanings of "constructed 
response'' within a framework provided by validity theory. The next three chapters discuss 
the construct vahdity of constructed-response measures from psychometric, psychological, 
and integrated perspectives. The chapters in the following group address measurement tech- 
niques that will contribute to the incorporation of constructed-response measures into stan- 
dardized assessments. Within the next group of chapters, attentions turns to discussions of 
more 

extended assessment exercises, such as portfohos. The next chapter uses the assessment of 
teachers to illustrate issues in the reform of educational measurement. It provides a transition 
to the final chapter, which focuses on pohcy questions — the federal government's role, and 
the conflicting perspectives that influence decision making. 

The book was edited by Randy Elliot Bennett and William C. Ward of Educational 
Testing Service. 



Copies of the book stre available from: 

Lawrence Erlbaum Associates, Inc., Pubhshers 
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Preface 



The multiple-choice question is the mainstay of standardized testing programs 
in the United States. The format has achieved this position because it per- 
mits inexpensive and apparently objective scoring; because such questions 
can be answered quickly, allowing broad content coverage within a testing 
session; and because a sophisticated statistical technology has evolved to sup- 
port the analysis and interpretation of test results. 

The reliance on multiple-choice questions, however, is increasingly criti- 
cized. Many have argued that tests and, in particular, test formats significantly 
influence education. Multiple-choice assessments are said to encourage the 
teachmg and learning of isolated facts and rote procedures at the expense 
of conceptual understanding and the development of problem-solving skills. 
It IS believed that, for education reform to occur, the methods used to meas- 
ure attainment must themselves be transformed. 

To address the limitations of the multiple-choice format, many educators 
and psychologists have advocated increased use of constructed-response tasks. 
I nese tasks may be a.s simple as producing a numerical answer to an arith- 
meuc question or as extensive as producing the numerous drafts that cul- 
minate in a finely honed essay or planning and conducting a series of scientific 
DeSlv tU''" ^'°P°"^"^^ that constructed-response assessments, es- 

Produrnnn°'^ '^''"''^ ^'''^"'^^'^ P''°^l^'" yield complex 

r^nZ!:^:^:Te::uf^^''''' P~ ^^^^^^^ '--'"^ than do 

The use of such tasks, however, raises several critical concerns. If the 
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swer to a question is an extended problem solution, fewer questions can be 
asked in a fixed testing period, reducing the breadth of content coverage 
possible. The less constrained the task and the solution, the greater is the 
possibility that lack of standardization in test administration, and lack of ob- 
jective criteria for evaluation, may adversely affect the comparability of results 
across persons and situations. These conditions can threaten the representa- 
tiveness of the test results as a sample of the individual's capabilities, and 
thus the validity and fairness of the test. 

Growing attention to these and related issues has suggested that it would 
be timely to bring together persons who could contribute to an understand- 
ing of problems and possibilities associated with the various assessment for- 
mats. First a conference, and then this volume, resulted. 

The conference, sponsored by Educational Testing Service, was held in 
Princeton, New Jersey, in November, 1990. Speakers and attendees represent- 
ed a variety of viewpoints in educational research and policymaking. The 
presentations and discussions were informative, provocative, and notably lack- 
ing in polemics. 

This book, comprising nine chapters based on the major conference presen- 
tations plus five newly invited contributions, maintains the same tone. Rhetoric 
calling for the abolition of traditional testing methods as useless or perni- 
cious, or on the other hand for dismissal of new approaches as imprac- 
tical, IS lacking. Rather, the authors seek to provide perspectives and build 
frameworks that will contribute to future research agendas and policy de- 
bates. Such statements are not as dramatic as the more extreme positions 
that can be found in the press and even in journals, but they are, we believe, 
more useful. 

The first chapter in the volume, that by Bennett, explores the meanings 
of constructed response" within a framework provided by validity theory 
The next three chapters discuss the construct validity of constructed-response 
measures. Traub provides a psychometric perspective; Snow, a psychologi- 
cal one; and Messick, an integration of the two. 

The chapters in the following group address measurement techniques that 
wiM contribute to the incorporation of constructed-response measures into 
standardized assessments. Mislevy outlines the use of "inference networks" 
in evaluating the contributions of different types of test questions. Tatsuoka 
discusses a model for item design to elucidate the skills and knowledge un- 
derlying observable performance. Dorans and Schmitt describe techniques 
for the analysis of group differences in item performance. Finally, Braswell 
and Kupin examine alternative formats for assessment in mathematics 

With the next group of chapters, attention turns to discussions of more 
extended assessment exercises. Camp explores the role of portfolios in the 
assessment of writing. Wolf draws from both the classroom and the reflec- 
tions of practicing artists to view assessments as occasions of learning. Gitomer 
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provides a framework for the design of performance assessments in educa- 
tional measurement. 

Dwyer's chapter uses the assessment of teachers to illustrate issues in the 
reform of educational measurement. It provides a transition to the final chap- 
ters in the volume, which focus on questions of poIicy—Hartle and Battaglia 
from the perspective of the federal government's role, and Robinson explor- 
ing the conflicting perspectives that influence decision making. 

Important contrasts between the more narrowly psychometric and the so- 
cial policy perspectives are evident in these chapters. The two viewpoints 
are in agreement in seeking means of improving educational measurement; 
but they differ, at least implicitly, in what is meant by "better." From the 
policy perspective, better measurement involves tasks that have verisimili- 
tude, that send the right messages to those concerned with education, and 
that help directly and indirectly to cause increased success for learners. From 
the psychometric, "better" means more reliable or more representative of 
cognitive skills underlying an achievement, or perhaps less susceptible to con- 
tamination by construct-irrelevant group differences. From the first of these 
perspectives, it may make good sense to trade some accuracy of measure- 
ment for a superior assessment; from the second, that proposition is almost 
a contradiction in terms. 

Another aspect of the contrast in perspectives is that there are significant 
differences in how the line is drawn to distinguish variations in measurement 
methodology that make a difference. From the psychometric viewpoint, the 
step from a multiple-choice mathematics question to one in which the ex- 
ammee is asked to grid an answer in is a very big change; one has to be 
concerned about the consequences of this change for test reliability, difficulty 
speededness. and so on. Any variation in format and scoring rubric must be 
studied exhaustively. From the policy perspective, however, such changes 
are minor. The constructed-response measures that are seen as likely to make 
a difference are far more complex and real-worldly, barely on the same con- 
tinuum with the array of measures likely to be considered by those for whom 
such factors are the critical concerns. 

Just as evident as the differences should be the indication of ways in which 
these contrasts might be bridged. Several of the chapters offer organizing 
schemes and discussions that can begin the synthesis needed to promote the 
objective shared by all of the contributors: achieving more socially useful 
socially responsible measurement. We hope this volume contributes if only 
in a small way, to that important goal. 

Randy Elliot Bennett 
William C. Ward 
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